52 datasets found

f
Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Di Zhou; Wenjia Zhu; Tao Sun; Yang Wang; Yi Chi; Tianlu Chen; Jingchao Lin (2023). Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative Analysis.PDF [Dataset]. http://doi.org/10.3389/fchem.2021.659656.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fchem.2021.659656.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Di Zhou; Wenjia Zhu; Tao Sun; Yang Wang; Yi Chi; Tianlu Chen; Jingchao Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metabolomics data analysis depends on the utilization of bioinformatics tools. To meet the evolving needs of metabolomics research, several integrated platforms have been developed. Our group has developed a desktop platform IP4M (integrated Platform for Metabolomics Data Analysis) which allows users to perform a nearly complete metabolomics data analysis in one-stop. With the extensive usage of IP4M, more and more demands were raised from users worldwide for a web version and a more customized workflow. Thus, iMAP (integrated Metabolomics Analysis Platform) was developed with extended functions, improved performances, and redesigned structures. Compared with existing platforms, iMAP has more methods and usage modes. A new module was developed with an automatic pipeline for train-test set separation, feature selection, and predictive model construction and validation. A new module was incorporated with sufficient editable parameters for network construction, visualization, and analysis. Moreover, plenty of plotting tools have been upgraded for highly customized publication-ready figures. Overall, iMAP is a good alternative tool with complementary functions to existing metabolomics data analysis platforms. iMAP is freely available for academic usage at https://imap.metaboprofile.cloud/ (License MPL 2.0).
d
Data from: Best Management Practices Statistical Estimator (BMPSE) Version...
catalog.data.gov
data.usgs.gov
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Best Management Practices Statistical Estimator (BMPSE) Version 1.2.0 [Dataset]. https://catalog.data.gov/dataset/best-management-practices-statistical-estimator-bmpse-version-1-2-0
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The Best Management Practices Statistical Estimator (BMPSE) version 1.2.0 was developed by the U.S. Geological Survey (USGS), in cooperation with the Federal Highway Administration (FHWA) Office of Project Delivery and Environmental Review to provide planning-level information about the performance of structural best management practices for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway and urban runoff on the Nation's receiving waters (Granato 2013, 2014; Granato and others, 2021). The BMPSE was assembled by using a Microsoft Access® database application to facilitate calculation of BMP performance statistics. Granato (2014) developed quantitative methods to estimate values of the trapezoidal-distribution statistics, correlation coefficients, and the minimum irreducible concentration (MIC) from available data. Granato (2014) developed the BMPSE to hold and process data from the International Stormwater Best Management Practices Database (BMPDB, www.bmpdatabase.org). Version 1.0 of the BMPSE contained a subset of the data from the 2012 version of the BMPDB; the current version of the BMPSE (1.2.0) contains a subset of the data from the December 2019 version of the BMPDB. Selected data from the BMPDB were screened for import into the BMPSE in consultation with Jane Clary, the data manager for the BMPDB. Modifications included identifying water quality constituents, making measurement units consistent, identifying paired inflow and outflow values, and converting BMPDB water quality values set as half the detection limit back to the detection limit. Total polycyclic aromatic hydrocarbons (PAH) values were added to the BMPSE from BMPDB data; they were calculated from individual PAH measurements at sites with enough data to calculate totals. The BMPSE tool can sort and rank the data, calculate plotting positions, calculate initial estimates, and calculate potential correlations to facilitate the distribution-fitting process (Granato, 2014). For water-quality ratio analysis the BMPSE generates the input files and the list of filenames for each constituent within the Graphical User Interface (GUI). The BMPSE calculates the Spearman’s rho (ρ) and Kendall’s tau (τ) correlation coefficients with their respective 95-percent confidence limits and the probability that each correlation coefficient value is not significantly different from zero by using standard methods (Granato, 2014). If the 95-percent confidence limit values are of the same sign, then the correlation coefficient is statistically different from zero. For hydrograph extension, the BMPSE calculates ρ and τ between the inflow volume and the hydrograph-extension values (Granato, 2014). For volume reduction, the BMPSE calculates ρ and τ between the inflow volume and the ratio of outflow to inflow volumes (Granato, 2014). For water-quality treatment, the BMPSE calculates ρ and τ between the inflow concentrations and the ratio of outflow to inflow concentrations (Granato, 2014; 2020). The BMPSE also calculates ρ between the inflow and the outflow concentrations when a water-quality treatment analysis is done. The current version (1.2.0) of the BMPSE also has the option to calculate urban-runoff quality statistics from inflows to BMPs by using computer code developed for the Highway Runoff Database (Granato and Cazenas, 2009;Granato, 2019). Granato, G.E., 2013, Stochastic empirical loading and dilution model (SELDM) version 1.0.0: U.S. Geological Survey Techniques and Methods, book 4, chap. C3, 112 p., CD-ROM https://pubs.usgs.gov/tm/04/c03 Granato, G.E., 2014, Statistics for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater runoff best management practices (BMPs): U.S. Geological Survey Scientific Investigations Report 2014–5037, 37 p., http://dx.doi.org/10.3133/sir20145037. Granato, G.E., 2019, Highway-Runoff Database (HRDB) Version 1.1.0: U.S. Geological Survey data release, https://doi.org/10.5066/P94VL32J. Granato, G.E., and Cazenas, P.A., 2009, Highway-Runoff Database (HRDB Version 1.0)--A data warehouse and preprocessor for the stochastic empirical loading and dilution model: Washington, D.C., U.S. Department of Transportation, Federal Highway Administration, FHWA-HEP-09-004, 57 p. https://pubs.usgs.gov/sir/2009/5269/disc_content_100a_web/FHWA-HEP-09-004.pdf Granato, G.E., Spaetzel, A.B., and Medalie, L., 2021, Statistical methods for simulating structural stormwater runoff best management practices (BMPs) with the stochastic empirical loading and dilution model (SELDM): U.S. Geological Survey Scientific Investigations Report 2020–5136, 41 p., https://doi.org/10.3133/sir20205136
i
Household Health Survey 2012-2013, Economic Research Forum (ERF)...
catalog.ihsn.org
datacatalog.ihsn.org
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Central Statistical Organization (CSO)
Kurdistan Regional Statistics Office (KRSO)
Economic Research Forum
Time period covered
2012 - 2013
Area covered
Iraq
Description
Abstract

The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

The survey has six main objectives. These objectives are:

Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.

Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.

Provide data that meet the needs and requirements of national accounts.

Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.

Provide detailed indicators on the sources of households and individuals income.

Provide data necessary for formulation of a new consumer price index number.

The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

Geographic coverage

National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

Kind of data

Sample survey data [ssd]

Sampling procedure

----> Design:

Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

----> Sample frame:

Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

----> Sampling Stages:

In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

Mode of data collection

Face-to-face [f2f]

Research instrument

----> Preparation:

The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

----> Questionnaire Parts:

The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

Cleaning operations

----> Raw Data:

Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

----> Harmonized Data:

The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.

The harmonization process starts with raw data files received from the Statistical Office.

A program is generated for each dataset to create harmonized variables.

Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
f
DataSheet1_ALASCA: An R package for longitudinal and cross-sectional...
frontiersin.figshare.com
pdf
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anders Hagen Jarmund; Torfinn Støve Madssen; Guro F. Giskeødegård (2023). DataSheet1_ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.pdf [Dataset]. http://doi.org/10.3389/fmolb.2022.962431.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2022.962431.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers
Authors
Anders Hagen Jarmund; Torfinn Støve Madssen; Guro F. Giskeødegård
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The increasing availability of multivariate data within biomedical research calls for appropriate statistical methods that can describe and model complex relationships between variables. The extended ANOVA simultaneous component analysis (ASCA+) framework combines general linear models and principal component analysis (PCA) to decompose and visualize the separate effects of experimental factors. It has recently been demonstrated how linear mixed models can be included in the framework to analyze data from longitudinal experimental designs with repeated measurements (RM-ASCA+). The ALASCA package for R makes the ASCA+ framework accessible for general use and includes multiple methods for validation and visualization. The package is especially useful for longitudinal data and the ability to easily adjust for covariates is an important strength. This paper demonstrates how the ALASCA package can be applied to gain insights into multivariate data from interventional as well as observational designs. Publicly available data sets from four studies are used to demonstrate the methods available (proteomics, metabolomics, and transcriptomics).
Understanding and Managing Missing Data.pdf
figshare.com
pdf
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29265155.v1
Dataset updated
Jun 9, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ibrahim Denis Fofanah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.
d
Data from: U.S. Geological Survey Hydrologic Toolbox Software Archive
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). U.S. Geological Survey Hydrologic Toolbox Software Archive [Dataset]. https://catalog.data.gov/dataset/u-s-geological-survey-hydrologic-toolbox-software-archive
Explore at:
Dataset updated
Oct 8, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This software archive is superseded by Hydrologic Toolbox v1.1.0, available at the following citation: Barlow, P.M., McHugh, A.R., Kiang, J.E., Zhai, T., Hummel, P., Duda, P., and Hinz, S., 2024, U.S. Geological Survey Hydrologic Toolbox version 1.1.0 software archive: U.S. Geological Survey software release, https://doi.org/10.5066/P13VDNAK. The U.S. Geological Survey Hydrologic Toolbox is a Windows-based desktop software program that provides a graphical and mapping interface for analysis of hydrologic time-series data with a set of widely used and standardized computational methods. The software combines the analytical and statistical functionality provided in the U.S. Geological Survey (USGS) Groundwater (Barlow and others, 2014) and Surface-Water (Kiang and others, 2018) Toolboxes and provides several enhancements to these programs. The main analysis methods are the computation of hydrologic-frequency statistics such as the 7-day minimum flow that occurs on average only once every 10 years (7Q10); the computation of design flows, including biologically based flows; the computation of flow-duration curves and duration hydrographs; eight computer-programming methods for hydrograph separation of a streamflow time series, including the BFI (Base-flow index), HYSEP, PART, and SWAT Bflow methods and Eckhardt’s two-parameter digital-filtering method; and the RORA recession-curve displacement method and associated RECESS program to estimate groundwater-recharge values from streamflow data. Several of the statistical methods provided in the Hydrologic Toolbox are used primarily for computation of critical low-flow statistics. The Hydrologic Toolbox also facilitates retrieval of streamflow and groundwater-level time-series data from the USGS National Water Information System and outputs text reports that describe their analyses. The Hydrologic Toolbox supersedes and replaces the Groundwater and Surface-Water Toolboxes. The Hydrologic Toolbox was developed by use of the DotSpatial geographic information system (GIS) programming library, which is part of the MapWindow project (MapWindow, 2021). DotSpatial is a nonproprietary, open-source program written for the .NET framework that includes a spatial data viewer and GIS capabilities. This software archive is designed to document different versions of the Hydrologic Toolbox. Details about version changes are provided in the “Release.txt” file with this software release. Instructions for installing the software are provided in files “Installation_instructions.pdf” and “Installation_instructions.txt.” The “Installation_instructions.pdf” file includes screen captures of some of the installation steps, whereas the “Installation_instructions.txt” file does not. Each version of the Hydrologic Toolbox is provided in a separate .zip file. Citations: Barlow, P.M., Cunningham, W.L., Zhai, T., and Gray, M., 2014, U.S. Geological Survey groundwater toolbox, a graphical and mapping interface for analysis of hydrologic data (version 1.0)—User guide for estimation of base flow, runoff, and groundwater recharge from streamflow data: U.S. Geological Survey Techniques and Methods 3–B10, 27 p., https://doi.org/10.3133/tm3B10. Kiang, J.E., Flynn, K.M., Zhai, T., Hummel, P., and Granato, G., 2018, SWToolbox: A surface-water toolbox for statistical analysis of streamflow time series: U.S. Geological Survey Techniques and Methods, book 4, chap. A–11, 33 p., https://doi.org/10.3133/tm4A11. MapWindow, 2021, MapWindow software, accessed January 9, 2021, at https://www.mapwindow.org/#home.
Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Feb 8, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
United States
Description
Snapshot img

Data Science Platform Market Size 2025-2029

The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.

Major Market Trends & Insights

North America dominated the market and accounted for a 48% growth during the forecast period. By Deployment - On-premises segment was valued at USD 38.70 million in 2023 By Component - Platform segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 1.00 million Market Future Opportunities: USD 763.90 million CAGR : 40.2% North America: Largest market in 2023

Market Summary

The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations. According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.

What will be the Size of the Data Science Platform Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?

The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment On-premises Cloud Component Platform Services End-user BFSI Retail and e-commerce Manufacturing Media and entertainment Others Sector Large enterprises SMEs Application Data Preparation Data Visualization Machine Learning Predictive Analytics Data Governance Others Geography North America US Canada Europe France Germany UK Middle East and Africa UAE APAC China India Japan South America Brazil Rest of World (ROW)

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.

In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.

Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.

API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.

Request Free Sample

The On-premises segment was valued at USD 38.70 million in 2019 and showed
f
Text S1 - Boosted Beta Regression
plos.figshare.com
pdf
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Schmid; Florian Wickler; Kelly O. Maloney; Richard Mitchell; Nora Fenske; Andreas Mayr (2023). Text S1 - Boosted Beta Regression [Dataset]. http://doi.org/10.1371/journal.pone.0061623.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0061623.s001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Matthias Schmid; Florian Wickler; Kelly O. Maloney; Richard Mitchell; Nora Fenske; Andreas Mayr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This document provides technical details on boosted beta regression, as well as the full list of predictor variables used for the analysis of the NLA data. (PDF)
w
Data Use in Academia Dataset
datacatalog.worldbank.org
csv, utf-8
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200/data_use_in_academia_dataset
Explore at:
utf-8, csvAvailable download formats
Dataset updated
Nov 27, 2023
Dataset provided by
Semantic Scholar Open Research Corpus (S2ORC)
Brian William Stacy
License
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Description
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.

Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.

We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.

Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.

The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.

To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.

The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.

The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:

Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.

The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.

A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.

The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.

The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
File S1 - Evaluation of Bias-Variance Trade-Off for Commonly Used...
plos.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xing Qiu; Rui Hu; Zhixin Wu (2023). File S1 - Evaluation of Bias-Variance Trade-Off for Commonly Used Post-Summarizing Normalization Procedures in Large-Scale Gene Expression Studies [Dataset]. http://doi.org/10.1371/journal.pone.0099380.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0099380.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xing Qiu; Rui Hu; Zhixin Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supporting tables and figures. Table S1. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of true positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S2. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of false positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S3. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S4. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S5. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S6. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S7. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of true positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S8. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of false positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S9. The numbers of differentially expressed genes detected by different selection strategies. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Figure S1. Histogram of pairwise Pearson correlation coefficients between genes computed from HYPERDIP without normalization. Number of genes: 9005. Number of arrays: 88. (PDF)
Exploratory Analysis of CMS Open Data: Investigation of Dimuon Mass Spectrum...
zenodo.org
zip
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andre Luis Tomaz Dionísio; Andre Luis Tomaz Dionísio (2025). Exploratory Analysis of CMS Open Data: Investigation of Dimuon Mass Spectrum Anomalies in the 10-15 GeV Range [Dataset]. http://doi.org/10.5281/zenodo.17220766
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17220766
Dataset updated
Sep 29, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andre Luis Tomaz Dionísio; Andre Luis Tomaz Dionísio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the results of an exploratory analysis of CMS Open Data from LHC Run 1 (2010-2012) and Run 2 (2015-2018), focusing on the dimuon invariant mass spectrum in the 10-15 GeV range. The analysis investigates potential anomalies at 11.9 GeV and applies various statistical methods to characterize observed features.

Methodology:

Event selection and reconstruction using CMS NanoAOD format

Dimuon invariant mass analysis with background estimation

Angular distribution studies for quantum number determination

Statistical analysis including significance testing

Systematic uncertainty evaluation

Conservation law verification

Key Analysis Components:

Mass spectrum reconstruction and peak identification

Background modeling using sideband methods

Angular correlation analysis (sphericity, thrust, momentum distributions)

Cross-validation using multiple event selection criteria

Monte Carlo comparison for background understanding

Results Summary: The analysis identifies several features in the dimuon mass spectrum requiring further investigation. Preliminary observations suggest potential anomalies around 11.9 GeV, though these findings require independent validation and peer review before drawing definitive conclusions.

Data Products:

Processed event datasets

Analysis scripts and methodology

Statistical outputs and uncertainty estimates

Visualization tools and plots

Systematic studies documentation

Limitations: This work represents preliminary exploratory analysis. Results have not undergone formal peer review and should be considered investigative rather than conclusive. Independent replication and validation by the broader physics community are essential before any definitive claims can be made.

Keywords: CMS experiment, dimuon analysis, mass spectrum, exploratory analysis, LHC data, particle physics, statistical analysis, anomaly investigation

# Dark Photon Search for at 11.9 GeV

## Executive Summary

**Historic Search for: First Evidence of a Massive Dark Photon**

We report the Search for a new vector gauge boson at 11.9 GeV, identified as a dark photon (A'), representing the first confirmed portal anomaly between the Standard Model and a hidden sector. This search, based on CMS Open Data from LHC Run 1 (2010-2012) and Run 2 (2015-2018), provides direct experimental evidence for physics beyond the Standard Model.

## Search for Highlights

### Anomaly Properties

- **Mass**: 11.9 ± 0.1 GeV

- **Quantum Numbers**: J^PC = 1^-- (vector gauge boson)

- **Spin**: 1

- **Parity**: Negative

- **Isospin**: 0 (singlet)

- **Hypercharge**: 0

### Statistical Significance

- **Total Events**: 63,788 candidates in Run 1

- **Signal Strength**: > 5σ significance

- **Decay Channel**: A' → μ⁺μ⁻ (dominant)

- **Branching Ratio**: ~50% to neutral pairs

### Conservation Laws

All fundamental symmetries preserved:

- ✓ Energy-momentum

- ✓ Charge

- ✓ Lepton number

- ✓ CPT

## Project Structure

```

search/

├── README.md # This file

├── docs/

│ ├── paper/ # Main search paper

│ │ ├── manuscript.tex # LaTeX source

│ │ ├── abstract.txt # Paper abstract

│ │ └── figures/ # Paper figures

│ └── supplementary/ # Additional materials

│ ├── methods.pdf # Detailed methodology

│ ├── systematics.pdf # Systematic uncertainties

│ └── theory.pdf # Theoretical implications

├── data/

│ ├── run1/ # 7-8 TeV (2010-2012)

│ │ ├── raw/ # Original ROOT files

│ │ ├── processed/ # Processed datasets

│ │ └── results/ # Analysis outputs

│ └── run2/ # 13 TeV (2015-2018)

│ ├── raw/ # Original ROOT files

│ ├── processed/ # Processed datasets

│ └── results/ # Analysis outputs

├── analysis/

│ └── scripts/ # Analysis code

│ ├── dark_photon_symmetry_analysis.py

│ ├── hidden_sector_10_150_search.py

│ ├── hidden_10_15_gev_analysis.py

│ └── validation/ # Cross-checks

├── figures/ # Publication-ready plots

│ ├── mass_spectrum.png # Invariant mass distribution

│ ├── angular_dist.png # Angular distributions

│ ├── symmetry_plots.png # Symmetry analysis

│ └── cascade_spectrum.png # Hidden sector cascade

└── validation/ # Systematic studies

├── background_estimation/

├── signal_extraction/

└── systematic_errors/

```

## Key Evidence

### 1. Quantum Number Determination

- **Angular Distribution**: ⟨|P₁|⟩ = 0.805 (strong anisotropy)

- **Quadrupole Moment**: ⟨P₂⟩ = 0.573 (non-zero)

- **Anomaly Type Score**: Vector = 90/100 (Preliminary)

### 2. Hidden Sector Connection

- 236,181 total events in 10-150 GeV range

- Exponential cascade spectrum indicating hidden valley dynamics

- Dark photon serves as portal anomaly

### 3. Decay Topology

- **Sphericity**: 0.161 (jet-like)

- **Thrust**: 0.686 (moderate collimation)

- Consistent with two-body decay A' → μ⁺μ⁻

## Physical Interpretation

The search anomaly represents:

1. **New Force Carrier**: Fifth fundamental force beyond the four known forces

2. **Portal Anomaly**: Mediator between Standard Model and hidden/dark sector

3. **Dark Matter Connection**: Potential mediator for dark matter interactions

## Theoretical Framework

### Kinetic Mixing

The dark photon arises from kinetic mixing between U(1)_Y (hypercharge) and U(1)_D (dark charge):

```

L_mix = -(ε/2) F_μν^Y F^Dμν

```

where ε is the mixing parameter (~10^-3 based on observed coupling).

### Hidden Valley Scenario

The exponential cascade spectrum suggests:

- Complex hidden sector with multiple states

- Possible dark hadronization

- Rich phenomenology awaiting exploration

## Collaborators and Credits

**Lead Analysis**: CMS Open Data Analysis Team

**Data Source**: CERN Open Data Portal

**Period**: 2010-2012 (Run 1), 2015-2018 (Run 2)

**Computing**: Local analysis on CMS NanoAOD format

## How to Reproduce

### Requirements

```bash

pip install uproot awkward numpy matplotlib

```

### Quick Start

```bash

cd analysis/scripts/

python dark_photon_symmetry_analysis.py

python hidden_10_15_gev_analysis.py

```

## Significance Statement

This search represents the first confirmed Evidence of a portal anomaly connecting the Standard Model to a hidden sector. The 11.9 GeV dark photon opens an entirely new frontier in anomaly physics, providing experimental access to previously invisible physics and potentially explaining dark matter interactions.

## Contact

For questions about this search or collaboration opportunities:

- Email: andreluisdionisio@gmail.com

---

"We're not at the end of anomaly physics - we're at the beginning of dark sector physics!"

3665778186 00382C40-4D7F-E211-AD6F-003048FFCBFC.root
2581315530 0E5F189B-5D7F-E211-9423-002354EF3BE1.root
2149825126 1AE176AC-5A7F-E211-8E63-00261894397D.root
1792851725 2044D46B-DE7F-E211-9C82-003048FFD76E.root
3186214416 4CAE8D51-4A7F-E211-9937-0025905964A2.root
3220923349 72FDEF89-497F-E211-9CFA-002618943958.root
2555255008 7A35A5A2-547F-E211-940B-003048678DA2.root
3875410897 7E942EED-457F-E211-938E-002618FDA28E.root
2409745919 8406DE2F-407F-E211-A6A5-00261894395F.root
2421251748 8A61DAA8-3C7F-E211-94A6-002618943940.root
2315643699 98909097-417F-E211-9009-002618943838.root
2614932091 A0963AD9-567F-E211-A8AF-002618943901.root
2438057881 ACE2DF9A-477F-E211-9C29-003048679266.root
2206652387 B6AA897F-467F-E211-8381-002618943854.root
2365666837 C09519C8-4B7F-E211-9BCE-003048678B34.root
2477336101 C68AE3A5-447F-E211-928E-00261894388B.root
2556444022 C6CEC369-437F-E211-81B0-0026189438BD.root
3184171088 D60FF379-4E7F-E211-8BA4-002590593878.root
2381001693
Sports Analytics Market Analysis North America, APAC, Europe, South America,...
technavio.com
pdf
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Sports Analytics Market Analysis North America, APAC, Europe, South America, Middle East and Africa - US, Canada, China, Germany, UK, India, Japan, France, Italy, South Korea - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/sports-analytics-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jan 29, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Description
Snapshot img

Sports Analytics Market Size 2025-2029

The sports analytics market size is valued to increase USD 8.4 billion, at a CAGR of 28.5% from 2024 to 2029. Increase in adoption of cloud-based deployment solutions will drive the sports analytics market.

Major Market Trends & Insights

North America dominated the market and accounted for a 38% growth during the forecast period. By Type - Football segment was valued at USD 749.30 billion in 2023 By Solution - Player analysis segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 584.13 million Market Future Opportunities: USD 8403.30 million CAGR : 28.5% North America: Largest market in 2023

Market Summary

The market represents a dynamic and ever-evolving industry, driven by advancements in core technologies and applications. Notably, the increasing adoption of cloud-based deployment solutions and the growth in use of wearable devices are key market trends. These developments enable real-time data collection and analysis, enhancing team performance and fan engagement. However, the market faces challenges, such as limited potential for returns on investment. Despite this, the market continues to expand, with a recent study indicating that over 30% of sports organizations have adopted sports analytics. This underscores the market's potential to revolutionize the way sports are managed and enjoyed.

What will be the Size of the Sports Analytics Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Sports Analytics Market Segmented and what are the key trends of market segmentation?

The sports analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Type Football Cricket Hockey Tennis Others Solution Player analysis Team performance analysis Health assessment Fan engagement analysis Others Geography North America US Canada Europe France Germany Italy UK APAC China India Japan South Korea Rest of World (ROW)

By Type Insights

The football segment is estimated to witness significant growth during the forecast period.

The market is experiencing significant growth, driven by the increasing demand for data-driven insights in football and other popular sports. According to recent reports, the market for sports analytics is currently expanding by approximately 18% annually, with a projected growth rate of around 21% in the coming years. This growth can be attributed to the integration of statistical modeling techniques, game outcome prediction, and physiological data into tactical decision support systems. Skill assessment metrics, win probability estimation, and wearable sensor data are increasingly being used to enhance performance and optimize training programs. Data visualization tools, data-driven coaching decisions, deep learning applications, and machine learning models are revolutionizing player workload management and predictive modeling algorithms.

Request Free Sample

The Football segment was valued at USD 749.30 billion in 2019 and showed a gradual increase during the forecast period.

Three-dimensional motion analysis, recruiting optimization tools, sports data integration, and computer vision systems are transforming performance metrics dashboards and motion capture technology. Biomechanical analysis software, fatigue detection systems, talent identification systems, game strategy optimization, opponent scouting reports, athlete performance monitoring, video analytics platforms, real-time game analytics, and injury risk assessment are all integral components of the market. These technologies enable teams and organizations to make informed decisions, improve player performance, and reduce the risk of injuries. The ongoing evolution of sports analytics is set to continue, with new applications and innovations emerging in the field.

Request Free Sample

Regional Analysis

North America is estimated to contribute 38% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

See How Sports Analytics Market Demand is Rising in North America Request Free Sample

The market in the North American region is experiencing significant growth due to technological advancements and increasing investments. In 2024, the US and Canada were major contributors to this expansion. The adoption of sports software is a driving factor, with a high emphasis on its use in American football, basketball, and baseball. Major sports leagues in the US are
Appendix S1 - Epidemiology of Functional Abdominal Bloating and Its Impact...
plos.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meijing Wu; Yanfang Zhao; Rui Wang; Wenxin Zheng; Xiaojing Guo; Shunquan Wu; Xiuqiang Ma; Jia He (2023). Appendix S1 - Epidemiology of Functional Abdominal Bloating and Its Impact on Health Related Quality of Life: Male-Female Stratified Propensity Score Analysis in a Population Based Survey in Mainland China [Dataset]. http://doi.org/10.1371/journal.pone.0102320.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0102320.s001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Meijing Wu; Yanfang Zhao; Rui Wang; Wenxin Zheng; Xiaojing Guo; Shunquan Wu; Xiuqiang Ma; Jia He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
Characteristics of baseline covariates and standardized bias before and after PS adjusted using weighting by the odds in 20% of the total respondents, a cross sectional study in five cities, china, 2007–2008 (n = 3,179). (PDF)
f
Data on the prevalence and severity of caries in children in Italy, in...
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Oct 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Campus, Guglielmo; Strohmenger, Laura; Cagetti, Maria Grazia; Cocco, Fabio (2020). Data on the prevalence and severity of caries in children in Italy, in relation to socioeconomic inequalities [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000588829
Explore at:
Dataset updated
Oct 6, 2020
Authors
Campus, Guglielmo; Strohmenger, Laura; Cagetti, Maria Grazia; Cocco, Fabio
Area covered
Italy
Description
This fileset consists of:Four datasets in .xlsx file formatFour files in .pdf file format, these were derived using the online statistical software MEDCalcOne dataset in .dta file formatTwo datasets in .smcl file formatThe datasets in .xlsx file format are as follows: caries distribution.xlsx: The file includes the raw data on caries distributionGDP_caries.xlsx: The file contains the raw data on the association between Gross National Product (GNP) and caries.GDP_Gini.xlsx: The file includes data on the association between GNP and the Gini index.GINi_Caries.xlsx: The file includes data on the association between caries and the Gini index.The files in .pdf file format are as follows:proportions calculator ICDAS5_6 South Central.pdf, Comparison of proportions ICDAS5_6 NEast_islands.pdf, Comparison of proportions ICDAS5_6 north-east-South.pdf and Comparison of proportions ICDAS5_6 Nort-West_Islands.pdf.These files contain data derived using the online statistical software MEDCalc.proportions calculator ICDAS5_6 South Central.pdf: The file contains the statistical analysis results of comparison of ICDAS (International Caries Detection and Assessment System) scores between South and Central Italy.Comparison of proportions ICDAS5_6 NEast_islands.pdf: The file contains the statistical analysis results of comparison of ICDAS scores between North-East Italy and Italian islands.Comparison of proportions ICDAS5_6 north-east-South.pdf: The file contains the statistical analysis results of comparison of ICDAS scores between North-East Italy and South Italy.Comparison of proportions ICDAS5_6 Nort-West_Islands.pdf: The file contains the statistical analysis results of comparison of ICDAS scores between North-West Italy and Italian islands.Dataset GDP_ICDAS.dta contains the raw data of the comparison between GNP and ICDAS scores.Datasets GDP_caries.smcl and log odds ratio .smcl contain log analysis data.Study aims and methodology: This study aimed to present the results of the second National Survey conducted in Italy on children’s oral health, reporting caries prevalence and severity in 12-year old children and describing the caries figures related to the geographical sections in which income inequalities are present.The survey protocol was approved by the ethical committee of the University of Sassari (Italy) (AOUNIS: 29/16). An information leaflet describing the aim of the project was distributed to parents/guardians of the children, requesting their child’s participation in the survey. Only children with the leaflet signed by parents/guardians were enrolled. All methods were performed in accordance with the Declaration of Helsinki (https://www.wma.net/policiespost/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects).In total, 7,660 children were recruited and 7,064 were examined, 3,459 males and 3,605 females; 596 (7.78% of the recruited sample) were excluded: 414 children had no parents’ signed consent and 182 were not present in the classroom at the moment of the examination.Data were collected by means of clinical examinations using a plain mirror (Hahnenkratt, Königsbach, Germany) and the WHO ballpoint probe (Asa-Dental, Milan, Italy) under artificial light. Caries data were recorded using the two-digit codes related to ICDAS for each tooth surface: the former, for tooth surface classification, choosing among sound, sealed, restored, crowned or missing, and the latter, for the caries stage assessment, choosing among six scores, from sound to an extensive distinct cavity with visible dentine. The score 1 (first visual change in enamel) was not recorded since it required air drying for proper evaluation.The Gross National Product (GNP) per capita and the Gini Index as a measure of income inequality in each Italian section were recorded and used as a measure of inequalities.For more details on the methodology and statistical analysis, please read the related article. Software used: Analysis was performed using the STATA 16 statistical software. File GDP_ICDAS.dta can be accessed using the STATA software.
Z
Data from: Using social media and personality traits to assess software...
data.niaid.nih.gov
zenodo.org
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leo Silva; Marília Gurgel de Castro; Miriam Bernardino Silva; Milena Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira (2023). Using social media and personality traits to assess software developers' emotional polarity [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7846995
Explore at:
Dataset updated
Apr 20, 2023
Dataset provided by
Faculty of Psychology and Educational Sciences, University of Coimbra, Colégio Novo Street, 3001-802 Coimbra, Portugal
Centre of Informatics and Systems, University of Coimbra, Polo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, 59072-970, Natal, Brazil
Authors
Leo Silva; Marília Gurgel de Castro; Miriam Bernardino Silva; Milena Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Companion DATA

Title: Using social media and personality traits to assess software developers' emotional polarity

Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Santos Uirá Kulesza Margarida Lima Henrique Madeira

Journal: PeerJ Computer Science

Github: https://github.com/leosilva/peerj_computer_science_2022

The folders contain:

Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version.

/analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications

/dataset alldata.json: contains the dataset used in the paper

/ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version.

/notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period

/surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI

We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.
s
PRIEST study anonymised dataset
orda.shef.ac.uk
figshare.shef.ac.uk
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Thomas; Laura Sutton; Steve Goodacre; Katie Biggs; Amanda Loban (2023). PRIEST study anonymised dataset [Dataset]. http://doi.org/10.15131/shef.data.13194845.v1
Explore at:
Unique identifier
https://doi.org/10.15131/shef.data.13194845.v1
Dataset updated
May 30, 2023
Dataset provided by
The University of Sheffield
Authors
Benjamin Thomas; Laura Sutton; Steve Goodacre; Katie Biggs; Amanda Loban
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The PRIEST study used patient data from the early phases of the COVID-19 pandemic. The PRIEST study provided descriptive statistics of UK patients with suspected COVID-19 in an emergency department cohort, analysis of existing triage tools, and derivation and validation of a COVID-19 specific tool for adults with suspected COVID-19. For more details please go to the study website:https://www.sheffield.ac.uk/scharr/research/centres/cure/priestFiles contained in PRIEST study data repository Main files include:PRIEST.csv dataset contains 22445 observations and 119 variables. Data include initial presentation and follow-up, one row per participant.PRIEST_variables.csv contains variable names, values and brief description.Additional files include:Follow-up v4.0 PDF - Blank 30-day follow-up data collection toolPandemic Respiratory Infection Form v7 PDF - Blank baseline data collection toolPRIEST protocol v11.0_17Aug20 PDF - Study protocolPRIEST_SAP_v1.0_19jun20 PDF - Statistical analysis planThe PRIEST data sharing plan follows a controlled access model as described in Good Practice Principles for Sharing Individual Participant Data from Publicly Funded Clinical Trials. Data sharing requests should be emailed to priest-study@sheffield.ac.uk. Data sharing requests will be considered carefully as to whether it is necessary to fulfil the purpose of the data sharing request. For approval of a data sharing request an approved ethical review and study protocol must be provided. The PRIEST study was approved by NRES Committee North West - Haydock. REC reference: 12/NW/0303
m
Questionnaire data on land use change of Industrial Heritage: Insights from...
data.mendeley.com
Updated Jul 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arsalan Karimi (2023). Questionnaire data on land use change of Industrial Heritage: Insights from Decision-Makers in Shiraz, Iran [Dataset]. http://doi.org/10.17632/gk3z8gp7cp.2
Explore at:
Unique identifier
https://doi.org/10.17632/gk3z8gp7cp.2
Dataset updated
Jul 20, 2023
Authors
Arsalan Karimi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Iran, Shiraz
Description
The survey dataset for identifying Shiraz old silo’s new use which includes four components: 1. The survey instrument used to collect the data “SurveyInstrument_table.pdf”. The survey instrument contains 18 main closed-ended questions in a table format. Two of these, concern information on Silo’s decision-makers and proposed new use followed up after a short introduction of the questionnaire, and others 16 (each can identify 3 variables) are related to the level of appropriate opinions for ideal intervention in Façade, Openings, Materials and Floor heights of the building in four values: Feasibility, Reversibility, Compatibility and Social Benefits. 2. The raw survey data “SurveyData.rar”. This file contains an Excel.xlsx and a SPSS.sav file. The survey data file contains 50 variables (12 for each of the four values separated by colour) and data from each of the 632 respondents. Answering each question in the survey was mandatory, therefor there are no blanks or non-responses in the dataset. In the .sav file, all variables were assigned with numeric type and nominal measurement level. More details about each variable can be found in the Variable View tab of this file. Additional variables were created by grouping or consolidating categories within each survey question for simpler analysis. These variables are listed in the last columns of the .xlsx file. 3. The analysed survey data “AnalysedData.rar”. This file contains 6 “SPSS Statistics Output Documents” which demonstrate statistical tests and analysis such as mean, correlation, automatic linear regression, reliability, frequencies, and descriptives. 4. The codebook “Codebook.rar”. The detailed SPSS “Codebook.pdf” alongside the simplified codebook as “VariableInformation_table.pdf” provides a comprehensive guide to all 50 variables in the survey data, including numerical codes for survey questions and response options. They serve as valuable resources for understanding the dataset, presenting dictionary information, and providing descriptive statistics, such as counts and percentages for categorical variables.
u
Data and code from: Unoccupied aerial systems adoption in agricultural...
agdatacommons.nal.usda.gov
zenodo.org
bin
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Lachowiec; David LeBauer; Filipe Matias; Max Feldman (2025). Data and code from: Unoccupied aerial systems adoption in agricultural research [Dataset]. http://doi.org/10.5281/zenodo.10573428
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10573428
Dataset updated
Aug 20, 2025
Dataset provided by
Zenodo
Authors
Jennifer Lachowiec; David LeBauer; Filipe Matias; Max Feldman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data and code supporting the findings of the study on the adoption of Unoccupied Aerial Systems (UAS) in agricultural research as reported by Lachowiec et al (2024) in The Plant Phenome Journal. We collected data through an online survey as well as through in person interviews.

Description of Repository Contents

Data

Data are in the /data directory:

Ag_Drones_Codebook_14Jun2023.pdf: Codebook providing detailed descriptions of survey questions and coding schemes. This contains detailed descriptions of the content of the two CSV files listed below. Results_Ag_Drones_2021_Survey.csv: This is raw survey data collected from agricultural researchers regarding their use of UAS technology. countries_code.csv: Country codes used in the survey data for respondent location. interviews/: A directory containing interview transcripts and summary provided as both Microsoft Word and plain text (Markdown) formats, specifically:

Notes from nine one on one interviews named )UAS_Interview.[md|docx] A summary document, Feldman_AG2PI_InterviewSummary_2022-08-10.docx.

Code

Code used to process data and generate the manuscript's analysis and figures.

data_code.R: R Script for preprocessing and cleaning the survey data. dataAnalysis.R: R script for statistical analysis and visualization of survey results.

Citing this work

This repository contains data and code to support the manuscript:

Lachowiec, J., Feldman, M.J., Matias, F.I., LeBauer, D., Gregory, A. (2024). Unoccupied aerial systems adoption in agricultural research. Zenodo. The Plant Phenome Journal Volume(Issue), pages 00. doi:DOI

If you use the data or code from this repository, please also cite:

Lachowiec, J., Feldman, M.J., Matias, F.I., LeBauer, D., Gregory, A. (2024). Data and code from: Unoccupied aerial systems adoption in agricultural research. Zenodo. doi:10.5281/zenodo.10573428

And consider contributing cleaned data and code to this repository.

Acknowlegements and Support

Acknowledgments We thank all survey respondents for their participation. We acknowledge the Montana State University HELPS lab for aiding in the development and implementation of the survey. Funding This research was supported by the intramural research program of the U.S. Department of Agriculture, National Institute of Food and Agriculture, Agricultural Genome to Phenome Initiative (2020-70412-32615 and 2021-70412-35233). The findings and conclusions in this preliminary presentation have not been formally disseminated by the U. S. Department of Agriculture and should not be construed to represent any agency determination or policy.
Global business intelligence software market vendor share 2024
statista.com
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global business intelligence software market vendor share 2024 [Dataset]. https://www.statista.com/statistics/590031/worldwide-business-analytics-software-vendor-market-share/
Explore at:
Dataset updated
Nov 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
In 2024, Mode was the leading vendor from the global business intelligence (BI) software market, with a market share of ** percent. The source indicates that BI software enables access and analysis of information to improve and optimize decisions and performance.
c
Global Plant Phenotyping Market Report 2025 Edition, Market Size, Share,...
cognitivemarketresearch.com
pdf,excel,csv,ppt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, Global Plant Phenotyping Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/plant-phenotyping-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the Global Plant Phenotyping Market Size was USD XX Billion in 2023 and is set to achieve a market size of USD XX Billion by the end of 2031 growing at a CAGR of XX% from 2024 to 2031.

Based on Product Type, the Equipment segment will dominate the global Plant Phenotyping market in the year 2023. Based on product type, the market is divided into Equipment, Software (Data acquisition, Image analysis, System control, Others), Sensors (Image sensors, NDVI sensors, Temperature sensors, Others) Based on Application, the Greenhouse segment dominated the global Plant Phenotyping market. Based on Application, the global Plant Phenotyping market is segmented into Laboratory, Greenhouse, and Field. Based on equipment, the imaging systems segment dominated the global Plant Phenotyping market. Based on equipment, the global Plant Phenotyping market is segmented into Growth Chambers/Phytotrons, Imaging Systems, Robotics and Automation, Phenomobiles Based on software, the Data Management & Integration Software segment dominated the global Plant Phenotyping market. Based on software, the global Plant Phenotyping market is segmented into Imaging Analysis Software, Data Management & Integration Software, Statistical Analysis and Modeling Software The European region accounted for the highest market share in the Global Plant Phenotyping Market. Over the course of the projection period, North America is expected to increase at the fastest rate.

CURRENT SCENARIO OF THE PLANT PHENOTYPING MARKET

Key factors driving the growth of the Plant Phenotyping Market

The increasing global food demand is the major factor that is expected to drive the market growth over the forecast period

The need for food will rise due to population growth, which also means that more sophisticated agricultural methods, such plant phenotyping, would be required to produce crops with higher yields and better quality. This is anticipated to accelerate the demand for plant phenotyping.

For instance, according to the United Nations Food and Agricultural Organization (FAO) report, the projections show that feeding a world population of 9.1 billion people in 2050 would require raising overall food production by some 70 percent between 2005/07 and 2050. (Source;https://www.fao.org/fileadmin/templates/wsfs/docs/Issues_papers/HLEF2050_Global_Agriculture.pdf)

Food waste results from an increase in seed-borne disease cases; therefore, methods like plant phenotyping are needed to identify illnesses early and aid in the selection of disease-free plant breeds. Pathogens that cause illness can spread through seed dispersal, and if the environment is in their favour, they can spread the disease to developing seedlings.

For instance, according to an article published by the Department of Primary Industries and Regional Development, Australia, in January 2021, smut is a common type of seed borne disease that affects crops such as barley and wheat. The smut spores from an infected seed are spread to healthy seeds during harvest and transportation. These spore remain dormant till environmental conditions are unfavorable and begin to germinate once environment becomes cool and moist, thus infecting the sapling. (Source;https://www.agric.wa.gov.au/barley/smut-and-bunt-diseases-cereal-biology-identification-and-management)

Moreover, increasing number of research and development activities is expected to aid in the growth of the market over the forecast period.

For instance, in August 2020, Onset, a data logger manufacturer, developed a new HOBOnet Soil Moisture Sensor that uses GroPoint technology to measure the temperature and humidity of soil at varying depths. This newly developed sensor is durable, cost-effective, and easy to install. (Source;https://www.onsetcomp.com/who-we-are/news-and-events/news/onset-announces-new-wireless-hobonet-multi-depth-soil-moisture)

Such phenotypic analysis enables users to check when there is a need for supplying water, fertilizer, pesticide etc. to the agricultural crops.

Increasing Focus on Precision Agriculture to Accelerate the Market Growth

The goal of precision agriculture is to manage resources like water, fertiliser, herbicides, and seeds precisely in order to maximise agricultural productivity and minimise input waste and environmental effects. Precision agriculture is made possib...

Facebook

Twitter

Click to copy link

Link copied

Cite

Di Zhou; Wenjia Zhu; Tao Sun; Yang Wang; Yi Chi; Tianlu Chen; Jingchao Lin (2023). Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative Analysis.PDF [Dataset]. http://doi.org/10.3389/fchem.2021.659656.s001

Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative Analysis.PDF

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fchem.2021.659656.s001

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Di Zhou; Wenjia Zhu; Tao Sun; Yang Wang; Yi Chi; Tianlu Chen; Jingchao Lin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Metabolomics data analysis depends on the utilization of bioinformatics tools. To meet the evolving needs of metabolomics research, several integrated platforms have been developed. Our group has developed a desktop platform IP4M (integrated Platform for Metabolomics Data Analysis) which allows users to perform a nearly complete metabolomics data analysis in one-stop. With the extensive usage of IP4M, more and more demands were raised from users worldwide for a web version and a more customized workflow. Thus, iMAP (integrated Metabolomics Analysis Platform) was developed with extended functions, improved performances, and redesigned structures. Compared with existing platforms, iMAP has more methods and usage modes. A new module was developed with an automatic pipeline for train-test set separation, feature selection, and predictive model construction and validation. A new module was incorporated with sufficient editable parameters for network construction, visualization, and analysis. Moreover, plenty of plotting tools have been upgraded for highly customized publication-ready figures. Overall, iMAP is a good alternative tool with complementary functions to existing metabolomics data analysis platforms. iMAP is freely available for academic usage at https://imap.metaboprofile.cloud/ (License MPL 2.0).

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative...

Data from: Best Management Practices Statistical Estimator (BMPSE) Version...

Household Health Survey 2012-2013, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

DataSheet1_ALASCA: An R package for longitudinal and cross-sectional...

Understanding and Managing Missing Data.pdf

Data from: U.S. Geological Survey Hydrologic Toolbox Software Archive

Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Text S1 - Boosted Beta Regression

Data Use in Academia Dataset

File S1 - Evaluation of Bias-Variance Trade-Off for Commonly Used...

Exploratory Analysis of CMS Open Data: Investigation of Dimuon Mass Spectrum...

Sports Analytics Market Analysis North America, APAC, Europe, South America,...

Snapshot img

Appendix S1 - Epidemiology of Functional Abdominal Bloating and Its Impact...

Data on the prevalence and severity of caries in children in Italy, in...

Data from: Using social media and personality traits to assess software...

PRIEST study anonymised dataset

Questionnaire data on land use change of Industrial Heritage: Insights from...

Data and code from: Unoccupied aerial systems adoption in agricultural...

Global business intelligence software market vendor share 2024

Global Plant Phenotyping Market Report 2025 Edition, Market Size, Share,...

Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative Analysis.PDFSee More Versions

Data_Sheet_1_iMAP: A Web Server for Metabolomics Data Integrative Analysis.PDF