77 datasets found

f
Data from: Data transformation: an underestimated tool by inappropriate use
scielo.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos (2023). Data transformation: an underestimated tool by inappropriate use [Dataset]. http://doi.org/10.6084/m9.figshare.6083840.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6083840.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELO journals
Authors
João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT. There are researchers who do not recommend data transformation arguing it causes problems in inferences and mischaracterises data sets, which can hinder interpretation. There are other researchers who consider data transformation necessary to meet the assumptions of parametric models. Perhaps the largest group of researchers who make use of data transformation are concerned with experimental accuracy, which provokes the misuse of this tool. Considering this, our paper offer a study about the most frequent situations related to data transformation and how this tool can impact ANOVA assumptions and experimental accuracy. Our database was obtained from measurements of seed physiology and seed technology. The coefficient of variation cannot be used as an indicator of data transformation. Data transformation might violate the assumptions of analysis of variance, invalidating the idea that its use will provoke fail the inferences, even if it does not improve the quality of the analysis. The decision about when to use data transformation is dichotomous, but the criteria for this decision are many. The unit (percentage, day or seedlings per day), the experimental design and the possible robustness of F-statistics to ‘small deviations’ to Normal are among the main indicators for the choice of the type of transformation.
D
Statistics Software Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Statistics Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-statistics-software-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 22, 2024
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Statistics Software Market Outlook

The global statistics software market size is projected to grow from USD 10.5 billion in 2023 to USD 18.7 billion by 2032, exhibiting a CAGR of 6.5% over the forecast period. The growth of this market is driven by the increasing adoption of data-driven decision-making processes across various industries, the rising need for statistical modeling and analysis tools, and the growing emphasis on advanced analytics to gain competitive advantages. Additionally, the expanding use of artificial intelligence (AI) and machine learning (ML) technologies to enhance the capabilities of statistics software is contributing significantly to market growth.

One of the primary growth factors of the statistics software market is the increasing reliance on data analytics and business intelligence tools across different sectors. Organizations are leveraging statistical software to analyze large volumes of data generated through various digital channels, enabling them to make informed decisions and identify new business opportunities. This trend is particularly evident in the healthcare, finance, and retail sectors, where data-driven insights are crucial for improving operational efficiency, customer satisfaction, and overall performance.

Another key driver for the market is the proliferation of big data and the need for advanced data management solutions. With the exponential growth of data generated by various sources such as social media, IoT devices, and enterprise systems, there is a heightened demand for robust statistical software that can handle complex data sets and perform sophisticated analyses. This has led to increased investments in the development of innovative statistics software solutions that offer enhanced features and capabilities, such as real-time data processing, predictive analytics, and automated reporting.

The integration of AI and ML technologies into statistics software is also significantly boosting market growth. These technologies enable more accurate and efficient data analysis, allowing organizations to uncover hidden patterns and trends that were previously impossible to detect. AI-powered statistical tools can automate repetitive tasks, reduce human error, and provide deeper insights into data, thereby enhancing the overall decision-making process. As a result, there is a growing adoption of AI-driven statistics software across various industries, further propelling market expansion.

Regionally, North America is expected to maintain its dominance in the statistics software market, owing to the presence of numerous leading software providers, high adoption of advanced analytics solutions, and substantial investments in research and development. However, the Asia Pacific region is anticipated to witness the highest growth rate over the forecast period, driven by the rapid digital transformation of businesses, increasing awareness of data analytics benefits, and supportive government initiatives promoting technological advancements.

Component Analysis

The statistics software market is segmented by component into software and services. The software segment includes various types of statistical analysis tools, ranging from basic data visualization software to advanced predictive analytics platforms. This segment holds the largest market share due to the widespread adoption of software solutions that enable organizations to analyze and interpret data efficiently. The continuous development of innovative features, such as real-time analytics, data mining, and machine learning capabilities, is further driving the demand for statistics software.

In contrast, the services segment encompasses consulting, implementation, training, and support services provided by software vendors and third-party providers. These services are crucial for organizations to effectively utilize statistical software and maximize its benefits. The growing complexity of data and the need for specialized expertise in data analysis are driving the demand for professional services in the statistics software market. Moreover, as more businesses adopt advanced analytics solutions, the need for ongoing support and training services is expected to increase, contributing to the growth of the services segment.

The integration of cloud computing with statistics software is also influencing the component-wise growth of this market. Cloud-based solutions offer several advantages, such as scalability, flexibility, and cost-effectiveness, making them an attractive option for organizations of all sizes. As a result, there is a
d
Learning Disability Services Monthly Statistics - AT: July 2021, MHSDS: May...
digital.nhs.uk
csv, xlsx
Updated Aug 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Learning Disability Services Monthly Statistics - AT: July 2021, MHSDS: May 2021 Final [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/learning-disability-services-statistics/at-july-2021-mhsds-may-2021-final
Explore at:
xlsx(1.8 MB), csv(122.9 kB)Available download formats
Dataset updated
Aug 19, 2021
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Jul 1, 2021 - Jul 31, 2021
Area covered
England
Description
Contains monthly data from the Assuring Transformation dataset. Data is available in Excel or CSV format. PLEASE NOTE: Some updates to the structure and numbering of the data tables and csv were applied from April 2021. This was primarily to group similar table types and content together. Additionally we have increased the amount of tables that have time series data retrospectively updated each month (green tabs). We welcome any feedback on this updated format.
Big data and business analytics revenue worldwide 2015-2022
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data and business analytics (BDA) market was valued at ***** billion U.S. dollars in 2018 and is forecast to grow to ***** billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around ** billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate **** ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around **** billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.
f
Evaluating Functional Diversity: Missing Trait Data and the Importance of...
plos.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello (2023). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. http://doi.org/10.1371/journal.pone.0149270
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0149270
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.
S
Digital Transformation Statistics By Trends, Expenditure, Adoption And...
sci-tech-today.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sci-Tech Today (2025). Digital Transformation Statistics By Trends, Expenditure, Adoption And Predictions [Dataset]. https://www.sci-tech-today.com/stats/digital-transformation-statistics/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Sci-Tech Today
License
https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

Digital Transformation Statistics:Â Today, businesses are trying to embrace innovative technologies that are also challenging, as they quickly change the digital environment worldwide. Digital transformation statistics involve integrating these technologies to boost productivity, efficiency,Â and sustainability in operations.

This concept emerged during the COVID-19 pandemic, which heralded an avalanche of more agile and intelligent ways of doing business. The main technologies driving this transformation include artificial intelligence (AI), big data, and cloud computing, which have diverse applications across different sectors. A key trend in 2024 is for companies to adopt new technologies to remain competitive in their respective fields of business.

With a projected $3.7 trillion by the end of this year for the global digital transformation statistics market, it becomes clear that the adoption of cloud computing, automation, and AI has become a major propeller for business growth. As more companies adopt digital strategies, market researchers must understand current trends and statistics that will inform future strategies.
Priorities in digital transformation among businesses Vietnam 2022
statista.com
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Priorities in digital transformation among businesses Vietnam 2022 [Dataset]. https://www.statista.com/statistics/1368527/vietnam-digital-transformation-priorities/
Explore at:
Dataset updated
Jul 5, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
Vietnam
Description
According to a survey conducted in 2022, the majority of respondents from large domestic and multi-national businesses operating in Vietnam identified the establishment of Business Continuity Plans (BCP) and Disaster Recovery plans (DRP), alongside using data analytics platforms as the leading initiative categories in their digital transformation survey. The same survey revealed that 84 percent of respondents have a cloud migration strategy.
Economic Statistics Transformation Programme: enhanced financial accounts...
s3.amazonaws.com
gov.uk
Updated Mar 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2020). Economic Statistics Transformation Programme: enhanced financial accounts (UK flow of funds): historical data for the household and NPISH financial categories AF.6, AF.7 and AF.8 assets and liabilities [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/161/1613491.html
Explore at:
Dataset updated
Mar 16, 2020
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Office for National Statistics
Area covered
United Kingdom
Description
Official statistics are produced impartially and free from political influence.
Digital transformation spending worldwide 2017-2027
statista.com
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Digital transformation spending worldwide 2017-2027 [Dataset]. https://www.statista.com/statistics/870924/worldwide-digital-transformation-market-size/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2024, spending on digital transformation (DX) is projected to reach *** trillion U.S. dollars. By 2027, global digital transformation spending is forecast to reach *** trillion U.S. dollars. What is digital transformation? Digital transformation refers to the adoption of digital technology to transform business processes and services from non-digital to digital. This encompasses, among others, moving data to the cloud, using technological devices and tools for communication and collaboration, as well as automating processes. What is driving digital transformation? Digital transformation growth is due to several contributing factors. Among these was COVID-19 pandemic, which has increased the digital transformation tempo in organizations around the globe in 2020 considerably. Although the pandemic is over, working from home among organizations globally has not only remained, but also increased, increasing the drive for digital transformation. Other contributing causes include customer demand and the need to be on par with competitors. Overall, utilizing technologies for digital transformation render organizations more agile in responding to changing markets and enhance innovation, thereby making them more resilient.
e
ID 2007 Mental Health indicator
data.europa.eu
data.gov.uk
html
Updated Oct 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2021). ID 2007 Mental Health indicator [Dataset]. https://data.europa.eu/data/datasets/id_2007_mental_health_indicator
Explore at:
htmlAvailable download formats
Dataset updated
Oct 30, 2021
Dataset authored and provided by
Office for National Statistics
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Indices of Deprivation (ID) 2004: Health Deprivation and Disability, measure of adults under 60 suffering from mood or anxiety disorders, based on prescribing, suicides, and health benefits data Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2007 Type of data: Administrative data (with statistical transformations applied) Notes: These data represent a 'standardised and normalised measure' of mental health problems within an area rather than an absolute count or percentage of mental health problems.
A
‘California Housing Data (1990)’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Data (1990)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-data-1990-a0c5/b7389540/?iid=007-628&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Source

This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

The data is based on California Census in 1990.

About the Data (from the book):

"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

The following is the description from the book author:

This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

About the Data (From Luís Torgo page):

http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

This is a dataset obtained from the StatLib repository. Here is the included description:

"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

End-to-End ML Project Steps (Chapter 2 of the book)

Look at the big picture

Get the data

Discover and visualize the data to gain insights

Prepare the data for Machine Learning algorithms

Select a model and train it

Fine-tune your model

Present your solution

Launch, monitor, and maintain your system

The 10-Step Machine Learning Project Workflow (My Version)

Define business object

Make sense of the data from a high level

data types (number, text, object, etc.)

continuous/discrete

basic stats (min, max, std, median, etc.) using boxplot

frequency via histogram

scales and distributions of different features

Create the traning and test sets using proper sampling methods, e.g., random vs. stratified

Correlation analysis (pair-wise and attribute combinations)

Data cleaning (missing data, outliers, data errors)

Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)

Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)

Fine tune the model using trying different combinations of hyperparameters

Evaluate the model with best estimators in the test set

Launch, monitor, and refresh the model and system

--- Original source retains full ownership of the source dataset ---
l
Household Income and Expenditure Survey 2016 - Liberia
microdata.lisgislr.org
catalog.ihsn.org
+1more
Updated Oct 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liberia Institute for Statistics and Geo-Information Services (2024). Household Income and Expenditure Survey 2016 - Liberia [Dataset]. https://microdata.lisgislr.org/index.php/catalog/29
Explore at:
Dataset updated
Oct 17, 2024
Dataset authored and provided by
Liberia Institute for Statistics and Geo-Information Services
Time period covered
2016 - 2017
Area covered
Liberia
Description
Abstract

The main purpose of the Household Income Expenditure Survey (HIES) 2016 was to offer high quality and nationwide representative household data that provided information on incomes and expenditure in order to update the Consumer Price Index (CPI), improve National Accounts statistics, provide agricultural data and measure poverty as well as other socio-economic indicators. These statistics were urgently required for evidence-based policy making and monitoring of implementation results supported by the Poverty Reduction Strategy (I & II), the AfT and the Liberia National Vision 2030. The survey was implemented by the Liberia Institute of Statistics and Geo-Information Services (LISGIS) over a 12-month period, starting from January 2016 and was completed in January 2017. LISGIS completed a total of 8,350 interviews, thus providing sufficient observations to make the data statistically significant at the county level. The data captured the effects of seasonality, making it the first of its kind in Liberia. Support for the survey was offered by the Government of Liberia, the World Bank, the European Union, the Swedish International Development Corporation Agency, the United States Agency for International Development and the African Development Bank. The objectives of the 2016 HIES were:

Update the Consumer Price Index (CPI): To obtain a new set of weights for the basket of goods and services that upgrade the Monrovia Consumer Price Index (MCPI) and the National Consumer Price Index (NCPI) and to revise the CPI basket of goods and services in Liberia to reflect the current consumption pattern of residence.

Improve National Accounts Statistics: To get information on annual household expenditure patterns in order to update the household component of the National Accounts.

Measure Poverty: To prepare robust poverty indices that enable the understanding of poverty dynamics across the country and of the factors influencing them.

Improve Agricultural Statistics: To obtain nationally representative and policy relevant agricultural statistics in order to undertake in-depth analysis of agricultural households.

Capture Socio-economic Impact of Ebola Virus Disease (EVD): To obtain a post-EVD dataset which allows for an in-depth analysis of the socioeconomic impact of EVD on households.

Benchmark Agenda for Transformation Indicators: To provide an update on selected socioeconomic indicators used to benchmark the government’s policies embedded within the Agenda for Transformation.

Develop Statistical Capacity: Emphasize capacity building and development of sustainable statistical systems through every stage of the project to produce accurate and timely information about Liberia.

Geographic coverage

National

Analysis unit

Households

Individuals

Kind of data

Sample survey data [ssd]

Sampling procedure

The original sample design for the HIES exploited two-phased clustered sampling methods, encompassing a nationally representative sample of households in every quarter and was obtained using the 2008 National Housing and Population Census sampling frame. The procedures used for each sampling stage are as follows:
i. First stage
Selection of sample EAs. The sample EAs for the 2016 HIES were selected within each stratum systematically with Probability Proportional to Size from the ordered list of EAs in the sampling frame. They are selected separately for each county by urban/rural stratum. The measure of size for each EA was based on the number of households from the sampling frame of EAs based on the 2008 Liberia Census. Within each stratum the EAs were ordered geographically by district, clan and EA codes. This provided implicit geographic stratification of the sampling frame.

ii. Second stage
Selection of sample households within a sample EA. A random systematic sample of 10 households were selected from the listing for each sample EA. Using this type of table, the supervisor only has to look up the total number of households listed, and a specific systematic sample of households is identified in the corresponding row of the table.

Mode of data collection

Face-to-face [f2f]

Research instrument

There were three questionnaires administered for this survey: 1. Household and Individual Questionnaire 2. Market Price Questionnaire 3. Agricultural Recall Questionnaire

Cleaning operations

The data entry clerk for each team, using data entry software called CSPro, entered data for each household in the field. For each household, an error report was generated on-site, which identified key problems with the data collected (outliers, incorrect entries, inconsistencies with skip patterns, basic filters for age and gender specific questions etc.). The Supervisor along with the Data Entry Clerk and the Enumerator that collected the data reviewed these errors. Callbacks were made to households if necessary to verify information and rectify the errors while in that EA.

Once the data were collected in each EA, they were sent to LISGIS headquarters for further processing along with EA reports for each area visited. The HIES Technical committee converted the data into STATA and ran several consistency checks to manage overall data quality and prepared reports to identify key problems with the data set and called the field teams to update them about the same. Monthly reports were prepared by summarizing observations from data received from the field alongside statistics on data collection status to share with the field teams and LISGIS Management.
e
Index of Multiple Deprivation (IMD) 2007
data.europa.eu
html
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics, Index of Multiple Deprivation (IMD) 2007 [Dataset]. https://data.europa.eu/data/datasets/index_of_multiple_deprivation_imd_2007?locale=pt
Explore at:
htmlAvailable download formats
Dataset authored and provided by
Office for National Statistics
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Index of Multiple Deprivation 2007: Measure of multiple deprivation at small area level made up of seven domains Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2007 (using data from 2001 to 2005) Type of data: Administrative data (with statistical transformations applied)
g
ID 2004 Combined Road Distance to Services indicator
gimi9.com
cloud.csiss.gmu.edu
+2more
Updated Jan 21, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2010). ID 2004 Combined Road Distance to Services indicator [Dataset]. https://gimi9.com/dataset/uk_id_2004_combined_road_distance_to_services_indicator/
Explore at:
Dataset updated
Jan 21, 2010
Description
Road distances to nearest General Practice (GP) premises, primary schools, post offices and supermarket/convenience stores. Source: Office of the Deputy Prime Minister (ODPM): ID 2004 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2004 (using data from 2001 to 2003) Type of data: Administrative data (with statistical transformations applied)
W
ID 2007 Extent
cloud.csiss.gmu.edu
data.europa.eu
+1more
html
Updated Dec 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United Kingdom (2019). ID 2007 Extent [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/id_2007_extent
Explore at:
htmlAvailable download formats
Dataset updated
Dec 20, 2019
Dataset provided by
United Kingdom
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
ID 2007 Extent Score: Proportion of a district's population living in the most deprived SOAs in the country Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Local Authority District (LAD), County/Unitary Authority Geographic coverage: England Time coverage: 2007 Type of data: Administrative data (with statistical transformations applied)
e
ID 2007 Income Scale
data.europa.eu
html
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics, ID 2007 Income Scale [Dataset]. https://data.europa.eu/data/datasets/id_2007_income_scale
Explore at:
htmlAvailable download formats
Dataset authored and provided by
Office for National Statistics
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
ID 2007 Income Scale: Number of people income deprived Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Local Authority District (LAD), County/Unitary Authority Geographic coverage: England Time coverage: 2007 (using data from 2005 to 2006) Type of data: Administrative data (with statistical transformations applied)
W
ID 2007 Health Domain
cloud.csiss.gmu.edu
data.europa.eu
+1more
html
Updated Dec 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United Kingdom (2019). ID 2007 Health Domain [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/id_2007_health_domain
Explore at:
htmlAvailable download formats
Dataset updated
Dec 19, 2019
Dataset provided by
United Kingdom
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
ID 2007 Health Deprivation and Disability domain (high rates of premature death, poor health or disability) Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2007 (using data from 2001 to 2005) Type of data: Administrative data (with statistical transformations applied)
Modular Data-Transformation Modelling with Geospatial Semantic Array...
figshare.com
png
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele de Rigo (2016). Modular Data-Transformation Modelling with Geospatial Semantic Array Programming [Dataset]. http://doi.org/10.6084/m9.figshare.842695.v5
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.842695.v5
Dataset updated
Jan 18, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Daniele de Rigo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
de Rigo, D., Modular Data-Transformation Modelling with Geospatial Semantic Array Programming. FigShare Digital Science. DOI: 10.6084/m9.figshare.842695

Modular Data-Transformation Modelling with Geospatial Semantic Array Programming

Daniele de Rigo

Summary. Wide-scale transdisciplinary modelling for environment (WSTMe) is a scientific challenge with an increasingly important role in allowing strategic policy-making to be effectively discussed and programmed with the support of robust science [1]. Natural resources such as forests, water and soil, along with climate and human-driven changes, are subject to a network of interactions, whose large scale effects may be significant. WSTMe raises challenging issues when the characteristic heterogeneity of available geospatial information, complexity of systems and multiple sources of uncertainty (including those related to scientific software [2]) may affect the robustness, transparency and comprehensibility of hypotheses and results. In this respect, earth observation and computational science [3,4] are intrinsically linked and expected to deal with such a modular array of transdisciplinary aspects while preserving as much as possible conciseness and a terse semantics [5]. This is desirable in order to better communicate key messages and issues, both among different scientific communities and at the science-policy interface. Geospatial Semantic Array Programming (GeoSemAP) is a new approach [6] for WSTMe that has recently emerged in which a concise integration is introduced among semantics, geospatial tools and the array of data-transformation models (D-TM). WSTMe may often be described as a composition of D‑TMs where the flow of initial and derived/intermediate geo‑data highlights its array-based modular structure and semantics. Transparency (even due to the open science approach) is also a goal, to aid society in clearly understanding and controlling the implications of the technical apparatus on collective environmental decision-making [1–6].

Caption of the image. Wide-scale transdisciplinary modelling for environment (WSTMe) may often be described as a composition of data-transformation models (D‑TM) where the flow of initial and derived/intermediate geo‑data highlights its array-based modular structure and semantics (Geospatial Semantic Array Programming, GeoSemAP). Sources: [2,6].

References [1] van der Sluijs, J. P., 2005. Uncertainty as a Monster in the Science-Policy Interface: Four Coping Strategies. Water Science & Technology 52 (6), 87-92. http://scholar.google.com/scholar?cluster=3385318353116653032 [2] de Rigo, D., 2013. Software Uncertainty in Integrated Environmental Modelling: the role of Semantics and Open Science. Geophysical Research Abstracts 15, 13292+. http://scholar.google.com/scholar?cluster=13790404181931852043 [3] Peng, R. D., 2011. Reproducible Research in Computational Science. Science 334 (6060), 1226-1227. http://scholar.google.com/scholar?cluster=905554772905069177 [4] Morin, A., Urban, J., Adams, P. D., Foster, I., Sali, A., Baker, D., Sliz, P., 2012. Shining Light into Black Boxes. Science 336 (6078), 159-160. http://scholar.google.com/scholar?cluster=12575758499484368256 [5] de Rigo, D., 2012. Semantic Array Programming for Environmental Modelling: Application of the Mastrave Library. In: Seppelt, R., Voinov, A. A., Lange, S., Bankamp, D. (Eds.), International Environmental Modelling and Software Society (iEMSs) 2012 International Congress on Environmental Modelling and Software. Managing Resources of a Limited Planet: Pathways and Visions under Uncertainty, Sixth Biennial Meeting. pp. 1167-1176. http://scholar.google.com/scholar?cluster=6628751141895151391 [6] de Rigo, D., Corti, P., Caudullo, G., McInerney, D., Di Leo, M., San-Miguel-Ayanz, J., 2013. Toward Open Science at the European Scale: Geospatial Semantic Array Programming for Integrated Environmental Modelling. Geophysical Research Abstracts 15, 13245+. http://scholar.google.com/scholar?cluster=17118262245556811911
f
Data from: Insights into the Effects of Violating Statistical Assumptions...
acs.figshare.com
xlsx
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber O. Brown; Peter J. Green; Greta J. Frankham; Barbara H. Stuart; Maiken Ueland (2023). Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “‑omics” Data with Multiple Explanatory Variables [Dataset]. http://doi.org/10.1021/acsomega.3c01613.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acsomega.3c01613.s002
Dataset updated
Jun 9, 2023
Dataset provided by
ACS Publications
Authors
Amber O. Brown; Peter J. Green; Greta J. Frankham; Barbara H. Stuart; Maiken Ueland
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Biological volatilome analysis is inherently complex due to the considerable number of compounds (i.e., dimensions) and differences in peak areas by orders of magnitude, between and within compounds found within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques which aid in the selection of compounds that are considered relevant to respective research questions prior to further analysis. Currently, compounds of interest are identified using either supervised or unsupervised statistical methods which assume the data residuals are normally distributed and exhibit linearity. However, biological data often violate the statistical assumptions of these models related to normality and the presence of multiple explanatory variables which are innate to biological samples. In an attempt to address deviations from normality, volatilome data can be log transformed. However, whether the effects of each assessed variable are additive or multiplicative should be considered prior to transformation, as this will impact the effect of each variable on the data. If assumptions of normality and variable effects are not investigated prior to dimensionality reduction, ineffective or erroneous compound dimensionality reduction can impact downstream analyses. It is the aim of this manuscript to assess the impact of single and multivariable statistical models with and without the log transformation to volatilome dimensionality reduction prior to any supervised or unsupervised classification analysis. As a proof of concept, Shingleback lizard (Tiliqua rugosa) volatilomes were collected across their species distribution and from captivity and were assessed. Shingleback volatilomes are suspected to be influenced by multiple explanatory variables related to habitat (Bioregion), sex, parasite presence, total body volume, and captive status. This work determined that the exclusion of relevant multiple explanatory variables from analysis overestimates the effect of Bioregion and the identification of significant compounds. The log transformation increased the number of compounds that were identified as significant, as did analyses that assumed that residuals were normally distributed. Among the methods considered in this work, the most conservative form of dimensionality reduction was achieved through analyzing untransformed data using Monte Carlo tests with multiple explanatory variables.
Impact of AI on work performance 2023, by skill level
statista.com
ai-chatbox.pro
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bergur Thormundsson (2024). Impact of AI on work performance 2023, by skill level [Dataset]. https://www.statista.com/topics/6778/digital-transformation/
Explore at:
Dataset updated
Jun 28, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Bergur Thormundsson
Description
As of 2023, artificial intelligence (AI) has shown to improve work performance for both lower-skilled and higher-skilled workers. While the improvement gained from the use of AI was higher for lower-skilled workers with a performance score of 6.06, higher-skilled workers continued to perform better with and without the technology.

Facebook

Twitter

Click to copy link

Link copied

Cite

João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos (2023). Data transformation: an underestimated tool by inappropriate use [Dataset]. http://doi.org/10.6084/m9.figshare.6083840.v1

Data from: Data transformation: an underestimated tool by inappropriate use

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6083840.v1

Dataset updated

Jun 1, 2023

Dataset provided by

SciELO journals

Authors

João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ABSTRACT. There are researchers who do not recommend data transformation arguing it causes problems in inferences and mischaracterises data sets, which can hinder interpretation. There are other researchers who consider data transformation necessary to meet the assumptions of parametric models. Perhaps the largest group of researchers who make use of data transformation are concerned with experimental accuracy, which provokes the misuse of this tool. Considering this, our paper offer a study about the most frequent situations related to data transformation and how this tool can impact ANOVA assumptions and experimental accuracy. Our database was obtained from measurements of seed physiology and seed technology. The coefficient of variation cannot be used as an indicator of data transformation. Data transformation might violate the assumptions of analysis of variance, invalidating the idea that its use will provoke fail the inferences, even if it does not improve the quality of the analysis. The decision about when to use data transformation is dichotomous, but the criteria for this decision are many. The unit (percentage, day or seedlings per day), the experimental design and the possible robustness of F-statistics to ‘small deviations’ to Normal are among the main indicators for the choice of the type of transformation.

Clear search

Close search

Google apps

Main menu

Data from: Data transformation: an underestimated tool by inappropriate use

Statistics Software Market Report | Global Forecast From 2025 To 2033

Statistics Software Market Outlook

Component Analysis

Learning Disability Services Monthly Statistics - AT: July 2021, MHSDS: May...

Big data and business analytics revenue worldwide 2015-2022

Evaluating Functional Diversity: Missing Trait Data and the Importance of...

Digital Transformation Statistics By Trends, Expenditure, Adoption And...

Introduction

Priorities in digital transformation among businesses Vietnam 2022

Economic Statistics Transformation Programme: enhanced financial accounts...

Digital transformation spending worldwide 2017-2027

ID 2007 Mental Health indicator

‘California Housing Data (1990)’ analyzed by Analyst-2

Source

About the Data (from the book):

About the Data (From Luís Torgo page):

End-to-End ML Project Steps (Chapter 2 of the book)

The 10-Step Machine Learning Project Workflow (My Version)

Household Income and Expenditure Survey 2016 - Liberia

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Index of Multiple Deprivation (IMD) 2007

ID 2004 Combined Road Distance to Services indicator

ID 2007 Extent

ID 2007 Income Scale

ID 2007 Health Domain

Modular Data-Transformation Modelling with Geospatial Semantic Array...

Data from: Insights into the Effects of Violating Statistical Assumptions...

Impact of AI on work performance 2023, by skill level

Data from: Data transformation: an underestimated tool by inappropriate use