58 datasets found

GERD by sector of performance
ec.europa.eu
Updated May 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Commission (2025). GERD by sector of performance [Dataset]. https://ec.europa.eu/eurostat/databrowser/view/TSC00001/default/table
Explore at:
Dataset updated
May 2, 2025
Dataset authored and provided by
European Commissionhttp://ec.europa.eu/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This collection provides users with data about R&D expenditure and R&D personnel broken down by the following institutional sectors: business enterprise (BES); government (GOV); higher education (HES); private non-profit (PNP), total of all sectors.

The R&D expenditure is broken down by source of funds; sector of performance; type of costs; type of R&D; fields of research and development (FORD); https://circabc.europa.eu/ui/group/c1b49c83-24a7-4ff2-951c-621ac0a89fd8/library/b4b841e5-d200-41bc-8f23-d0b1e034f689?p=1&n=10&sort=modified_DESC">socio-economic objectives (NABS 2007) and by regions (https://showvoc.op.europa.eu/#/datasets/ESTAT_Nomenclature_of_Territorial_Units_for_Statistics/data">NUTS 2 level). The business enterprise sector is further broken down by economic activity (https://showvoc.op.europa.eu/#/datasets/ESTAT_Statistical_Classification_of_Economic_Activities_in_the_European_Community_Rev._2/data">NACE Rev.2); size class; industry orientation.

R&D personnel data are broken down by professional position; sector of performance; educational attainment level; sex; field of research and development (https://www.oecd.org/innovation/frascati-manual-2015-9789264239012-en.htm">FORD); regions (https://showvoc.op.europa.eu/#/datasets/ESTAT_Nomenclature_of_Territorial_Units_for_Statistics/data">NUTS 2 level); for the business enterprise sector is further broken down in size class and economic activity (NACE Rev.2). Researchers are further broken down by age class and citizenship.

The periodicity of R&D data are every two years, except for the key R&D indicators (R&D expenditure, R&D personnel (in Full Time Equivalent - FTE) and Researchers (in FTE) by sectors of performance) which are transmitted annually by the EU Member States (from 2003 onwards based on a legal obligation). Some other breakdowns of the data may appear on an annual basis based on voluntary data provisions.

The data are collected through sample or census surveys, from administrative registers or through a combination of sources.

R&D data are available for following countries and country groups:

All EU Member States; Candidate Countries; EFTA Countries; The Organisation for Economic Cooperation and Development (OECD) is data provider for the United States of America, Japan, South Korea and China.

Country groups: EU Member States, Euro Area States.

R&D data are compiled in accordance to the guidelines laid down in OECD (2015), https://www.oecd.org/publications/frascati-manual-2015-9789264239012-en.htm">Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, The Measurement of Scientific, Technological and Innovation Activities and the European business statistics methodological manual for R&D statistics – 2023 edition - Manuals and guidelines - Eurostat
Food Expenditure Series
catalog.data.gov
data.globalchange.gov
+4more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic Research Service, Department of Agriculture (2025). Food Expenditure Series [Dataset]. https://catalog.data.gov/dataset/food-expenditure-series
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Economic Research Servicehttp://www.ers.usda.gov/
Description
The ERS Food Expenditure Series annually measures total U.S. food expenditures, including purchases by consumers, governments, businesses, and nonprofit organizations. The ERS Food Expenditure Series contributes to the analysis of U.S. food production and consumption by constructing a comprehensive measure of the total value of all food expenditures by final purchasers. This series annually measures total U.S. food expenditures, including purchases by consumers, governments, businesses, and nonprofit organizations. Because the term expenditure is often associated with household decisionmaking, it is important to recognize that ERS's series also includes nonhousehold purchases. For example, the series includes the dollar value of domestic food purchases by military personnel and their dependents at military commissary stores and exchanges, the value of commodities and food dollars donated by the Federal government to schools, and the value of food purchased by airlines for serving during flights.
U
United States US: GERD: % of GDP
ceicdata.com
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). United States US: GERD: % of GDP [Dataset]. https://www.ceicdata.com/en/united-states/gross-domestic-expenditure-on-research-and-development-oecd-member-annual/us-gerd--of-gdp
Explore at:
Dataset updated
Mar 15, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2011 - Dec 1, 2022
Area covered
United States
Description
United States US: GERD: % of GDP data was reported at 3.586 % in 2022. This records an increase from the previous number of 3.483 % for 2021. United States US: GERD: % of GDP data is updated yearly, averaging 2.612 % from Dec 1981 (Median) to 2022, with 42 observations. The data reached an all-time high of 3.586 % in 2022 and a record low of 2.268 % in 1981. United States US: GERD: % of GDP data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s United States – Table US.OECD.MSTI: Gross Domestic Expenditure on Research and Development: OECD Member: Annual.
For the United States, from 2021 onwards, changes to the US BERD survey questionnaire allowed for more exhaustive identification of acquisition costs for ‘identifiable intangible assets’ used for R&D. This has resulted in a substantial increase in reported R&D capital expenditure within BERD. In the business sector, the funds from the rest of the world previously included in the business-financed BERD, are available separately from 2008. From 2006 onwards, GOVERD includes state government intramural performance (most of which being financed by the federal government and state government own funds). From 2016 onwards, PNPERD data are based on a new R&D performer survey. In the higher education sector all fields of SSH are included from 2003 onwards.
Following a survey of federally-funded research and development centers (FFRDCs) in 2005, it was concluded that FFRDC R&D belongs in the government sector - rather than the sector of the FFRDC administrator, as had been reported in the past. R&D expenditures by FFRDCs were reclassified from the other three R&D performing sectors to the Government sector; previously published data were revised accordingly. Between 2003 and 2004, the method used to classify data by industry has been revised. This particularly affects the ISIC category “wholesale trade” and consequently the BERD for total services.
U.S. R&D data are generally comparable, but there are some areas of underestimation:

i) Up to 2008, Government sector R&D performance covers only federal government activities. That by State and local government establishments is excluded;

ii) Except for the Government and the Business Enterprise sectors, the R&D data exclude most capital expenditures. For the Business Enterprise sector, depreciation is reported in place of gross capital expenditures up to 2014. Higher education (and national total) data were revised back to 1998 due to an improved methodology that corrects for double-counting of R&D funds passed between institutions.
Breakdown by type of R&D (basic research, applied research, etc.) was also revised back to 1998 in the business enterprise and higher education sectors due to improved estimation procedures.
The methodology for estimating researchers was changed as of 1985. In the Government, Higher Education and PNP sectors the data since then refer to employed doctoral scientists and engineers who report their primary work activity as research, development or the management of R&D, plus, for the Higher Education sector, the number of full-time equivalent graduate students with research assistantships averaging an estimated 50 % of their time engaged in R&D activities. As of 1985 researchers in the Government sector exclude military personnel. As of 1987, Higher education R&D personnel also include those who report their primary work activity as design.
Due to lack of official data for the different employment sectors, the total researchers figure is an OECD estimate up to 2019. Comprehensive reporting of R&D personnel statistics by the United States has resumed with records available since 2020, reflecting the addition of official figures for the number of researchers and total R&D personnel for the higher education sector and the Private non-profit sector; as well as the number of researchers for the government sector. The new data revise downwards previous OECD estimates as the OECD extrapolation methods drawing on historical US data, required to produce a consistent OECD aggregate, appear to have previously overestimated the growth in the number of researchers in the higher education sector.
Pre-production development is excluded from Defence GBARD (in accordance with the Frascati Manual) as of 2000. 2009 GBARD data also includes the one time incremental R&D funding legislated in the American Recovery and Reinvestment Act of 2009. Beginning with the 2000 GBARD data, budgets for capital expenditure – “R&D plant” in national terminology - are included. GBARD data for earlier years relate to budgets for current costs only.
n
Higher Education R&D Survey
nationaldataplatform.org
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Higher Education R&D Survey [Dataset]. https://nationaldataplatform.org/catalog/dataset/higher-education-r-d-survey
Explore at:
Dataset updated
Jun 22, 2025
Description
The Higher Education Research and Development (HERD) Survey is a dataset created by the U.S. National Center for Science and Engineering Statistics (NCSES), part of the National Science Foundation (NSF). It serves as the primary source for tracking research and development (R&D) expenditures at U.S. colleges and universities. The dataset collects detailed information on R&D spending by field of research, funding sources (e.g., federal, state, industry), types of research (basic, applied), and R&D personnel. Its purpose is to provide insights into academic R&D trends, supporting policy-making and resource allocation decisions. Key use cases include analyzing federal funding impacts, institutional research capacity, and workforce development in science and engineering. The survey covers institutions expending at least $150,000 in separately accounted R&D annually, ensuring comprehensive coverage of major research institutions. Unique aspects include its annual census design, granular breakdowns by discipline (e.g., computer sciences, biology), and historical tracking of long-term trends, such as the 11.2% increase in total higher education R&D expenditures reported for FY 2023 ($108 billion). Data are widely used for strategic planning in academia, government, and industry.
U.S. Household Mental Health & Covid-19
kaggle.com
Updated Jan 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). U.S. Household Mental Health & Covid-19 [Dataset]. https://www.kaggle.com/datasets/thedevastator/u-s-household-mental-health-covid-19/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
U.S. Household Mental Health & Covid-19

Assessing the Impact of the Pandemic

By US Open Data Portal, data.gov [source]

About this dataset

This dataset offers a closer look into the mental health care received by U.S. households in the last four weeks during the Covid-19 pandemic. The sheer scale of this crisis is inspiring people of all ages, backgrounds, and geographies to come together to tackle the problem. The Household Pulse Survey from the U.S. Census Bureau was published with federal agency collaboration in order to draw up accurate and timely estimates about how Covid-19 is impacting employment status, consumer spending, food security, housing stability, education interruption, and physical and mental wellness amongst American households. In order to deliver meaningful results from this survey data about wellbeing at various levels of society during this trying period – which includes demographic characteristics such as age gender race/ethnicity training attainment – each consulted household was randomly selected according to certain weighted criteria to maintain accuracy throughout the findings This dataset will help you explore what's it like on the ground right now for everyone affected by Covid-19 - Will it inform your decisions or point you towards new opportunities?

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains information about the mental health care that U.S. households have received in the last 4 weeks, during the Covid-19 pandemic. This data is valuable when wanting to track and measure mental health needs across the country and draw comparisons between regions based on support available.

To use this dataset, it is important to understand each of its columns or variables in order to draw meaningful insights from the data. The ‘Indicator’ column indicates which type of indicator (percentage or absolute number) is being measured by this survey, while ‘Group’ and 'Subgroup' provide more specific details about who was surveyed for each indicator included in this dataset.

The Columns ‘Phase’ and 'Time Period' provide information regarding when each of these indicators was measured - whether during a certain phase or over a particular timespan - while columns such as 'Value', 'LowCI' & 'HighCI' show us how many individuals fell into what quartile range for each measurement taken (e.g., how many people reported they rarely felt lonely). Similarly, the column Suppression Flag helps us identify cases where value has been suppressed if it falls below a certain benchmark; this allows us to calculate accurate estimates more quickly without needing to sort through all suppressed values manually each time we use this dataset for analysis purposes. Finally, columns such as ‘Time Period Start Date’ & ‘Time Period End Date’ indicate which exact dates were used for measurements taken over different periods throughout those dates specified – useful when conducting time-series related analyses over longer periods of time within our research scope)

Overall, when using this dataset it's important to keep in mind exactly what indicator type you're looking at - percentage points or absolute numbers - as well its associated group/subgroup characteristics so that you can accurately interpret trends based on key findings had by interpreting any correlations drawn from these results!

Research Ideas

Analyzing the effects of the Covid-19 pandemic on mental health care among different subgroups such as racial and ethnic minorities, gender and age categories.

Identifying geographical disparities in mental health services by comparing state level data for the same time period.

Comparing changes in mental health care indicators over time to understand how the pandemic has impacted people's access to care within a quarter or over longer periods

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. ...
w
Data Use in Academia Dataset
datacatalog.worldbank.org
csv, utf-8
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200?version=1
Explore at:
utf-8, csvAvailable download formats
Dataset updated
Nov 27, 2023
Dataset provided by
Brian William Stacy
Semantic Scholar Open Research Corpus (S2ORC)
License
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Description
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.

Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.

We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.

Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.

The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.

To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.

The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.

The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:

Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.

The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.

A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.

The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.

The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) |...
datarade.ai
.sql, .txt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Envestnet | Yodlee, Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) | Row/Aggregate Level | Consumer Data covering 3600+ public and private corporations [Dataset]. https://datarade.ai/data-products/envestnet-yodlee-s-de-identified-consumer-spending-data-r-envestnet-yodlee
Explore at:
.sql, .txtAvailable download formats
Dataset provided by
Yodlee
Envestnethttp://envestnet.com/
Authors
Envestnet | Yodlee
Area covered
United States of America
Description
Envestnet®| Yodlee®'s Consumer Spending Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.

Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.

We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.

Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?

Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking

Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)

Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence

Market Data: Analytics B2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis.

Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Feb 13, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027.

What is Big data?

Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets.

Big data analytics

Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
Spending on research and development of each country and “studies for money”...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert F. Wolff; Stefan Reinders; Michael Barth; Gerd Antes (2023). Spending on research and development of each country and “studies for money” within all specialities (left) and CAM-related studies (right). [Dataset]. http://doi.org/10.1371/journal.pone.0018798.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0018798.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Robert F. Wolff; Stefan Reinders; Michael Barth; Gerd Antes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Latest available data were used. If not stated otherwise. data are of 2007.1Gross Domestic Product in billion US-Dollar;2Gross domestic expenditure on R&D;3Studies per billion US-Dollar spent on R&D;42004;52005;62006;7R&D conducted by state and local governments is excluded;8Due to the lack of a comprehensive business register in South Africa. R&D expenditure may be underestimated by 10% to 15%.
U
United States US: Government Intramural Expenditure on R&D (GOVERD)
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States US: Government Intramural Expenditure on R&D (GOVERD) [Dataset]. https://www.ceicdata.com/en/united-states/gross-domestic-expenditure-on-research-and-development-oecd-member-annual/us-government-intramural-expenditure-on-rd-goverd
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2011 - Dec 1, 2022
Area covered
United States
Description
United States US: Government Intramural Expenditure on R&D (GOVERD) data was reported at 75.823 USD bn in 2023. This records an increase from the previous number of 73.443 USD bn for 2022. United States US: Government Intramural Expenditure on R&D (GOVERD) data is updated yearly, averaging 35.945 USD bn from Dec 1981 (Median) to 2023, with 43 observations. The data reached an all-time high of 75.823 USD bn in 2023 and a record low of 13.455 USD bn in 1981. United States US: Government Intramural Expenditure on R&D (GOVERD) data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s United States – Table US.OECD.MSTI: Gross Domestic Expenditure on Research and Development: OECD Member: Annual.
For the United States, some respondents revised their reporting practices and eliminated expenditures that did not meet the definition of R&D during the 2023 BERD data collection. This has resulted in a meaningful decrease in estimated U.S. R&D performance compared to the amount of 2023 R&D performance that would have been estimated based on respondent reporting practices used in 2022 and earlier..From 2021 onwards, changes to the US BERD survey questionnaire allowed for more exhaustive identification of acquisition costs for ‘identifiable intangible assets’ used for R&D. This has resulted in a substantial increase in reported R&D capital expenditure within BERD. In the business sector, the funds from the rest of the world previously included in the business-financed BERD, are available separately from 2008. From 2006 onwards, GOVERD includes state government intramural performance (most of which being financed by the federal government and state government own funds). From 2016 onwards, PNPERD data are based on a new R&D performer survey. In the higher education sector all fields of SSH are included from 2003 onwards.
Following a survey of federally-funded research and development centers (FFRDCs) in 2005, it was concluded that FFRDC R&D belongs in the government sector - rather than the sector of the FFRDC administrator, as had been reported in the past. R&D expenditures by FFRDCs were reclassified from the other three R&D performing sectors to the Government sector; previously published data were revised accordingly. Between 2003 and 2004, the method used to classify data by industry has been revised. This particularly affects the ISIC category “wholesale trade” and consequently the BERD for total services.
U.S. R&D data are generally comparable, but there are some areas of underestimation:

i) Up to 2008, Government sector R&D performance covers only federal government activities. That by State and local government establishments is excluded;

ii) Except for the Government and the Business Enterprise sectors, the R&D data exclude most capital expenditures. For the Business Enterprise sector, depreciation is reported in place of gross capital expenditures up to 2014. Higher education (and national total) data were revised back to 1998 due to an improved methodology that corrects for double-counting of R&D funds passed between institutions.
Breakdown by type of R&D (basic research, applied research, etc.) was also revised back to 1998 in the business enterprise and higher education sectors due to improved estimation procedures.
The methodology for estimating researchers was changed as of 1985. In the Government, Higher Education and PNP sectors the data since then refer to employed doctoral scientists and engineers who report their primary work activity as research, development or the management of R&D, plus, for the Higher Education sector, the number of full-time equivalent graduate students with research assistantships averaging an estimated 50 % of their time engaged in R&D activities. As of 1985 researchers in the Government sector exclude military personnel. As of 1987, Higher education R&D personnel also include those who report their primary work activity as design.
Due to lack of official data for the different employment sectors, the total researchers figure is an OECD estimate up to 2021. As of 2022, it is based on official personnel data available for all sectors. For years 2020 and 2021, it is based on official personnel data available for the business, PNP and Higher Education sectors, and OECD estimates for the Government sector (for estimating the missing FFRDC component). For previous years, OECD estimates were readjusted back to 2000.
The government personnel data includes the state government R&D personnel from 2021 and FFRDC R&D personnel from 2022. However, 8 FFRDC centres are not included as they could not report their R&D personnel data. These 8 centres account for 24% of the total R&D expenditure of all FFRDCs in 2022. Pre-production development is excluded from Defence GBARD (in accordance with the Frascati Manual) as of 2000. 2009 GBARD data also includes the one time incremental R&D funding legislated in the American Recovery and Reinvestment Act of 2009. Beginning with the 2000 GBARD data, budgets for capital expenditure – “R&D plant” in national terminology - are included. GBARD data for earlier years relate to budgets for current costs only.
Data from: Medical Expenditure Panel Survey
datacatalog.med.nyu.edu
Updated Apr 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States - Agency for Healthcare Research and Quality (AHRQ) (2025). Medical Expenditure Panel Survey [Dataset]. https://datacatalog.med.nyu.edu/dataset/10018
Explore at:
Dataset updated
Apr 9, 2025
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Authors
United States - Agency for Healthcare Research and Quality (AHRQ)
Time period covered
Jan 1, 1996 - Present
Area covered
United States
Description
The Medical Expenditure Panel Survey (MEPS) is a set of large-scale surveys of families and individuals, their medical providers (doctors, hospitals, pharmacies, etc.), and employers across the United States. MEPS collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of health insurance held by and available to U.S. workers. Data is publicly-available for two of the four MEPS components: the Household Component and the Insurance Component. Access to Medical Provider Component and Nursing Home Component data requires an application to the Agency for Health Care Research and Quality (AHRQ).
Predicting Credit Card Customer Segmentation
kaggle.com
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2024). Predicting Credit Card Customer Segmentation [Dataset]. https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2024
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Predicting Credit Card Customer Segmentation

Exploring Key Customer Characteristics

By [source]

About this dataset

This dataset contains a wealth of customer information collected from within a consumer credit card portfolio, with the aim of helping analysts predict customer attrition. It includes comprehensive demographic details such as age, gender, marital status and income category, as well as insight into each customer’s relationship with the credit card provider such as the card type, number of months on book and inactive periods. Additionally it holds key data about customers’ spending behavior drawing closer to their churn decision such as total revolving balance, credit limit, average open to buy rate and analyzable metrics like total amount of change from quarter 4 to quarter 1, average utilization ratio and Naive Bayes classifier attrition flag (Card category is combined with contacts count in 12months period alongside dependent count plus education level & months inactive). Faced with this set of useful predicted data points across multiple variables capture up-to-date information that can determine long term account stability or an impending departure therefore offering us an equipped understanding when seeking to manage a portfolio or serve individual customers

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to analyze the key factors that influence customer attrition. Analysts can use this dataset to understand customer demographics, spending patterns, and relationship with the credit card provider to better predict customer attrition.

Research Ideas

Using the customer demographics, such as gender, marital status, education level and income category to determine which customer demographic is more likely to churn.

Analyzing the customer’s spending behavior leading up to churning and using this data to better predict the likelihood of a customer of churning in the future.

Creating a classifier that can predict potential customers who are more susceptible to attrition based on their credit score, credit limit, utilization ratio and other spending behavior metrics over time; this could be used as an early warning system for predicting potential attrition before it happens

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: BankChurners.csv | Column name | Description | |:---------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------| | CLIENTNUM | Unique identifier for each customer. (Integer) | | Attrition_Flag | Flag indicating whether or not the customer has churned out. (Boolean) | | Customer_Age | Age of customer. (Integer) | | Gender | Gender of customer. (String) | | Dependent_count | Number of dependents that customer has. (Integer) | | Education_Level ...
A
Campaign Finance Database
data.amerigeoss.org
data.sfgov.org
+2more
html
Updated Jul 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2017). Campaign Finance Database [Dataset]. https://data.amerigeoss.org/da_DK/dataset/86de2f54-7918-434c-9a7c-a7e69c0ce57b
Explore at:
htmlAvailable download formats
Dataset updated
Jul 5, 2017
Dataset provided by
United States
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
The campaign finance database is the San Francisco Ethics Commission's repository for campaign finance filings. It can answer questions about who is contributing money, who is receiving money, and how it is being spent. Use the campaign finance database to research campaign contributions and expenditures reported on forms provided by the Fair Political Practices Commission. The database provides live access to the Ethics Commission's records. Filings are accessible once processed/posted by the Ethics Commission.Forms filed with the Ethics Commission can be downloaded in PDF format. Forms filed electronically can be searched and exported in Microsoft Excel format. The following Excel exports are available:- Excel file based on a search of itemized transactions up to 2,000 rows (Updated immediately, with the exception of FPPC filing deadlines -- within 48 hours);- Excel file by year or for entire life of a single committee (Updated immediately upon filing submission); and- Excel file by year for all committees in a single calendar year (Updated every 24 hours).
Medical Expenditure Panel Survey (MEPS) Restricted Data Files
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). Medical Expenditure Panel Survey (MEPS) Restricted Data Files [Dataset]. https://catalog.data.gov/dataset/medical-expenditure-panel-survey-meps-restricted-data-files
Explore at:
Dataset updated
Jul 29, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
Restricted Data Files Available at the Data Centers Researchers and users with approved research projects can access restricted data files that have not been publicly released for reasons of confidentiality at the AHRQ Data Center in Rockville, Maryland. Qualified researchers can also access restricted data files through the U.S. Census Research Data Center (RDC) network (http://www.census.gov/ces/dataproducts/index.html -- Scroll down the page and click on the Agency for Health Care Research and Quality (AHRQ) link.) For information on the RDC research proposal process and the data sets available, read AHRQ-Census Bureau agreement on access to restricted MEPS data.
The Correlates of State Policy Project
kaggle.com
Updated Jul 26, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Public Policy and Social Research (2017). The Correlates of State Policy Project [Dataset]. https://www.kaggle.com/ippsr/correlates-state-policy/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 26, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Institute for Public Policy and Social Research
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Correlates of State Policy Project aims to compile, disseminate, and encourage the use of data relevant to U.S. state policy research, tracking policy differences across and change over time in the 50 states. We have gathered more than nine-hundred variables from various sources and assembled them into one large, useful dataset. We hope this Project will become a “one-stop shop” for academics, policy analysts, students, and researchers looking for variables germane to the study of state policies and politics.

Content

The Correlates of State Policy Project includes more than nine-hundred variables, with observations across the U.S. 50 states and time (1900 – 2016). These variables represent policy outputs or political, social, or economic factors that may influence policy differences across the states. The codebook includes the variable name, a short description of the variable, the variable time frame, a longer description of the variable, and the variable source(s) and notes.

Take a look at the codebook PDF to get more information about each column

Acknowledgements

This aggregated data set is only possible because many scholars and students have spent tireless hours creating, collecting, cleaning, and making data publicly available. Thus if you use the dataset, please cite the original data sources.

Jordan, Marty P. and Matt Grossmann. 2016. The Correlates of State Policy Project v.1.10. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR).

This dataset was originally downloaded from

http://ippsr.msu.edu/public-policy/correlates-state-policy
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
d
Manufacturing Cost Guide
datasets.ai
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
0
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). Manufacturing Cost Guide [Dataset]. https://datasets.ai/datasets/manufacturing-cost-guide-7ccbc
Explore at:
0Available download formats
Dataset updated
Sep 18, 2024
Dataset authored and provided by
National Institute of Standards and Technology
Description
The Manufacturing Cost Guide is a tool that estimates the costs that US manufacturers face and can be used to help gauge the potential returns on manufacturing industry research projects through cost reductions. These costs are grouped into various standardized categories such as the North American Industry Classification System (NAICS) and the Standard Occupational Classification (SOC) system along with other non-standardized costs.

The tool can be used to answer questions such as:

-An organization is conducting research to reduce redundant engineering labor. How much do manufacturers spend on engineering? -A researcher is proposing a project to reduce the use of steel by advancing material standards, thereby, reducing scrap caused from material deficiencies. How much do manufacturers spend on steel? -A research organization is proposing to reduce energy consumption from machinery. How much is spent on energy for machine operation? -An organization wants to promote energy efficient lighting in manufacturing facilities. How much do manufacturers spend on lighting?
v
Data from: What We Eat In America (WWEIA) Database
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
data.amerigeoss.org
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). What We Eat In America (WWEIA) Database [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/what-we-eat-in-america-wweia-database-f7f35
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Area covered
United States
Description
What We Eat in America (WWEIA) is the dietary intake interview component of the National Health and Nutrition Examination Survey (NHANES). WWEIA is conducted as a partnership between the U.S. Department of Agriculture (USDA) and the U.S. Department of Health and Human Services (DHHS). Two days of 24-hour dietary recall data are collected through an initial in-person interview, and a second interview conducted over the telephone within three to 10 days. Participants are given three-dimensional models (measuring cups and spoons, a ruler, and two household spoons) and/or USDA's Food Model Booklet (containing drawings of various sizes of glasses, mugs, bowls, mounds, circles, and other measures) to estimate food amounts. WWEIA data are collected using USDA's dietary data collection instrument, the Automated Multiple-Pass Method (AMPM). The AMPM is a fully computerized method for collecting 24-hour dietary recalls either in-person or by telephone. For each 2-year data release cycle, the following dietary intake data files are available: Individual Foods File - Contains one record per food for each survey participant. Foods are identified by USDA food codes. Each record contains information about when and where the food was consumed, whether the food was eaten in combination with other foods, amount eaten, and amounts of nutrients provided by the food. Total Nutrient Intakes File - Contains one record per day for each survey participant. Each record contains daily totals of food energy and nutrient intakes, daily intake of water, intake day of week, total number foods reported, and whether intake was usual, much more than usual or much less than usual. The Day 1 file also includes salt use in cooking and at the table; whether on a diet to lose weight or for other health-related reason and type of diet; and frequency of fish and shellfish consumption (examinees one year or older, Day 1 file only). DHHS is responsible for the sample design and data collection, and USDA is responsible for the survey’s dietary data collection methodology, maintenance of the databases used to code and process the data, and data review and processing. USDA also funds the collection and processing of Day 2 dietary intake data, which are used to develop variance estimates and calculate usual nutrient intakes. Resources in this dataset:Resource Title: What We Eat In America (WWEIA) main web page. File Name: Web Page, url: https://res1wwwd-o-tarsd-o-tusdad-o-tgov.vcapture.xyz/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/wweianhanes-overview/ Contains data tables, research articles, documentation data sets and more information about the WWEIA program. (Link updated 05/13/2020)
Medical Expenditure Panel Survey (MEPS) Insurance Component Data Tools
catalog.data.gov
healthdata.gov
+1more
Updated Jul 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). Medical Expenditure Panel Survey (MEPS) Insurance Component Data Tools [Dataset]. https://catalog.data.gov/dataset/medical-expenditure-panel-survey-meps-insurance-component-data-tools
Explore at:
Dataset updated
Jul 25, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
The Medical Expenditure Panel Survey Insurance Component (MEPS-IC) is an annual survey of private employers and State and local governments. The MEPS-IC produces national and State level estimates of employer-sponsored insurance, including offered plans, costs, employee eligibility, and number of enrollees. With the MEPS-IC Data Tools, users can interactively explore maps, trends, and cross-sectional bar charts for topics related to national and state-level employer-based health insurance for employer characteristics/offerings; employee take-up; premiums; contributions; and cost-sharing. The MEPS-IC is sponsored by the Agency for Healthcare Research and Quality and is fielded by the U.S. Census Bureau.
n
Data from: Research funding for male reproductive health and infertility in...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Mar 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eva Gumerova; Christopher De Jonge; Christopher Barratt (2022). Research funding for male reproductive health and infertility in the UK and USA [2016 – 2019] [Dataset]. http://doi.org/10.5061/dryad.v9s4mw6wc
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.v9s4mw6wc
Dataset updated
Mar 1, 2022
Dataset provided by
University of Minnesota
University of Dundee
Authors
Eva Gumerova; Christopher De Jonge; Christopher Barratt
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United Kingdom, United States
Description
There is a paucity of data on research funding levels for male reproductive health (MRH). We investigated the research funding for MRH and infertility by examining publicly accessible webdatabases from the UK and USA government funding agencies. Information on the funding was collected from the UKRI-GTR, the NIHR’s Open Data Summary, and the USA’s NIH RePORT webdatabases. Funded projects between January 2016 and December 2019 were recorded and funding support was divided into three research categories: (i) male-based; (ii) female-based; and (iii) not-specified. Between January 2016 and December 2019, UK agencies awarded a total of £11,767,190 to 18 projects for male-based research and £29,850,945 to 40 projects for female-based research. There was no statistically significant difference in the median funding grant awarded within the male-based and female-based categories (p=0.56, W=392). The USA NIH funded 76 projects totalling $59,257,746 for male-based research and 99 projects totalling $83,272,898 for female-based research Again, there was no statistically significant difference in the median funding grant awarded between the two research categories (p=0.83, W=3834). This is the first study examining funding granted by main government research agencies from the UK and USA for MRH. These results should stimulate further discussion of the challenges of tackling male infertility and reproductive health disorders and formulating appropriate investment strategies. Methods Experimental Design: Publicly accessible UK Research and Innovation (UKRI), National Institute for Health Research (NIHR), and National Institutes of Health (NIH) funding agency databases covering awards from January 2016 to December 2019 were examined (see Supplementary Table 1). Following the inclusion and exclusion criteria outlined within Supplementary Tables 2,3, funding data were collected on research proposals investigating infertility and reproductive health. For simplicity, these are referred to collectively as ‘infertility research’. As the primary focus of this research is on infertility, the data were divided into three main categories: (i) male-based, (ii) female-based, and (iii) not-specified (Supplementary Table 2). The first two groups covered projects whose primary aim, based on the information presented in the research abstracts, timeline summaries and/or impact statements, was male- or female-focussed. “Not-specified” includes research projects that have either not specified a primary focus towards either male or female or have explicitly stated a focus on both. The process was conducted and reviewed by E.G. with C.L.R.B. Total funding for all three groups, funding over time, and comparison with overall funding for a particular agency was examined. Briefly, E.G. retrieved the primary data and produced the first set of data for discussion with C.L.R.B. Both went through the complete list and discussed each study/project and decided whether: (a) it should be included or not, and (b) what category does it fell under (male-, female-, or not-specified). The abstracts, which were almost always available and provided by each research study, were all examined and scrutinised by both E.G. and C.L.R.B together. If there was clear disagreement between E.G. and C.L.R.B, which were very rare, the project would not be included. UK Data Collection: From April 2018 the UK research councils, Innovate UK, and Research England are reported under one organization, the UKRI (2019). The councils independently fund research projects according to their respective visions and missions; however, until 2018/19, their annual funding expenditures were reported under the UKRI’s annual reports and budgets. The UKRI’s Gateway to Research (UKRI-GTR) web database allows users to analyse the information provided on taxpayer-funded research. Relevant search terms such as “male infertility” or “female reproductive health” (see Supplementary Table 2) were applied with appropriate database filters (Supplementary Table 1). The project award relevance was determined by assessing the objectives in project abstracts, timeline summaries, and planned impacts. Supplementary Tables 1, 2 and 3 provide the search filters and the reference criteria for inclusion/exclusion utilized for analysis. The UKRI-GTR provides the total funding granted to the projects within a designated period. Data obtained from the NIHR had minor differences. The NIHR has 6 datasets. The Open Data Summary View dataset was used as it provided details on funded projects, grants, summary abstracts, and project dates. Like the UKRI data, the NIHR excel datasheet had specific search terms and filters applied to sift out irrelevant projects (Supplementary Tables 1-3). The UKRI councils and NIHR report their annual expenditure and budgets for 1st April to 31st March. Thus, the projects will fall under the funding period of when their research activities begin (e.g. if a project’s research activities undergo between May 20th, 2017, to March 20th, 2019, this project will be categorized under the funding period 2017/18). The projects collected would begin their investigations between January 2016 to December 2019, therefore 5 consecutive funding periods were examined (2015/16, 2016/17, 2017/18, 2018/19, and 2019/20). The UK data collection period ran between October 2020 to December 2020. USA Data Collection: The NIH has a research portfolio online operating tools sites (RePORT) providing access to their research activities, such as previously funded research, active research projects, and information on NIH’s annual expenditures. The RePORT-Query database has similar features as the UKRI-GTR and NIHR such as providing information on project abstracts, research impact, start- and end-dates, funding grants, and type of research. Like the UK data collection, appropriate search terms were inputted with the database filters applied and followed the same inclusion-exclusion criteria (Supplementary Tables 1, 2, and 3). The UK and US agencies present data on funded research under different calendar and funding periods because the US’ federal tax policy requires federal bodies to report all funding expenses under a fiscal year (FY). The NIH’s FY follows a calendar period from October 1st to September 30th (e.g., FY2016 comprises funding activity from October 1st, 2015, to September 30th, 2016). Projects running over one calendar period are reported several times under consecutive fiscal years and the funds are divided according to the annual period of the project’s activity. During data collection, 74 projects were found as active with incomplete funding sums as the NIH divides the grants according to the budgeting period of every FY. The NIH are in the process of granting funds for the FY2021, so projects ending in 2020 or 2021 provide a complete funding sum. For the active projects ending after 2021, incomplete funding data is provided. It is assumed the funding will increase in value by the time the research ends in the future, but the final awarded sum is unknown. To remain consistent with the UK data, projects granted funding are totalled as one figure and recorded under the FY the project first began research, whether they are active or completed. Thus US funding is referred to as “Current Total Funding”. When going through the REPORTER database, the NIH present the same research project multiple times for every funded fiscal year with consecutive project reference IDs. Therefore, for simplicity, we only included the first project reference ID. For more information on deciphering NIH's project's IDs, see https://era.nih.gov/files/Deciphering_NIH_Application.pdf. For the USA, the initial data collection period ran between October 2020 to December 2020 but then restarted for a brief period in January 2021 to add up the remaining funding values for some of the active research projects. Data Analysis: The data was divided into three main groups and organized into the funding period or FY the project was first awarded. R-Studio (Version 1.3.1093) was utilized for the data analysis. Box-and-whisker plots are presented with rounded P-values. Kruskal-Wallis and Wilcoxon Rank Sum tests were generated to assess any statistical significance. The data was independently collected and does not assume a normal distribution, so the rank-based, non-parametric tests such as the Kruskal-Wallis and Wilcoxon Rank Sum were used. Research Project Details Included in the Collection Datasets: For both, the UK and USA data, we included the following details:

The project (or study) titles The Project IDs (also referred to as Project Reference or Project Number) The project Start and End Dates The project's Status (identified by the end dates or if explicitly stated in the database) The Funding Organisation (for the UK) and Admin Institute (for the USA) that are funding the research The project Category (i.e. Research Grants or Fellowships) The Amount Granted (for the USA, the funding values were summed up to the most recent awarding date).

Rearranging/Processing Data for Analysis: After the data collection has been completed, the data was processed into a simpler format in Notepad in order to perform the statistical analyses using RStudio. For that, only the essential details were included and organised that the RStudio system would recognise and analyse the information effectively and efficiently. The project Type (male, female or not-specifieded), funding sum for the respective research project Type, and the funding period (UK) / FY (USA) were included. These details were then arranged appropriately to produce box-and-whisker plots with P-values, perform the chosen statistical analysis tests, and produce the data statistics in RStudio. As mentioned earlier, the funding period/fiscal years were added following the timeframes set out by the respective countries.

Facebook

Twitter

Click to copy link

Link copied

Cite

European Commission (2025). GERD by sector of performance [Dataset]. https://ec.europa.eu/eurostat/databrowser/view/TSC00001/default/table

GERD by sector of performance

Explore at:

Dataset updated

May 2, 2025

Dataset authored and provided by

European Commissionhttp://ec.europa.eu/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This collection provides users with data about R&D expenditure and R&D personnel broken down by the following institutional sectors: business enterprise (BES); government (GOV); higher education (HES); private non-profit (PNP), total of all sectors.

The R&D expenditure is broken down by source of funds; sector of performance; type of costs; type of R&D; fields of research and development (FORD); https://circabc.europa.eu/ui/group/c1b49c83-24a7-4ff2-951c-621ac0a89fd8/library/b4b841e5-d200-41bc-8f23-d0b1e034f689?p=1&n=10&sort=modified_DESC">socio-economic objectives (NABS 2007) and by regions (https://showvoc.op.europa.eu/#/datasets/ESTAT_Nomenclature_of_Territorial_Units_for_Statistics/data">NUTS 2 level). The business enterprise sector is further broken down by economic activity (https://showvoc.op.europa.eu/#/datasets/ESTAT_Statistical_Classification_of_Economic_Activities_in_the_European_Community_Rev._2/data">NACE Rev.2); size class; industry orientation.

R&D personnel data are broken down by professional position; sector of performance; educational attainment level; sex; field of research and development (https://www.oecd.org/innovation/frascati-manual-2015-9789264239012-en.htm">FORD); regions (https://showvoc.op.europa.eu/#/datasets/ESTAT_Nomenclature_of_Territorial_Units_for_Statistics/data">NUTS 2 level); for the business enterprise sector is further broken down in size class and economic activity (NACE Rev.2). Researchers are further broken down by age class and citizenship.

The periodicity of R&D data are every two years, except for the key R&D indicators (R&D expenditure, R&D personnel (in Full Time Equivalent - FTE) and Researchers (in FTE) by sectors of performance) which are transmitted annually by the EU Member States (from 2003 onwards based on a legal obligation). Some other breakdowns of the data may appear on an annual basis based on voluntary data provisions.

The data are collected through sample or census surveys, from administrative registers or through a combination of sources.

R&D data are available for following countries and country groups:

All EU Member States; Candidate Countries; EFTA Countries; The Organisation for Economic Cooperation and Development (OECD) is data provider for the United States of America, Japan, South Korea and China.
Country groups: EU Member States, Euro Area States.

R&D data are compiled in accordance to the guidelines laid down in OECD (2015), https://www.oecd.org/publications/frascati-manual-2015-9789264239012-en.htm">Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, The Measurement of Scientific, Technological and Innovation Activities and the European business statistics methodological manual for R&D statistics – 2023 edition - Manuals and guidelines - Eurostat

Clear search

Close search

Google apps

Main menu

GERD by sector of performance

Food Expenditure Series

United States US: GERD: % of GDP

Higher Education R&D Survey

U.S. Household Mental Health & Covid-19

U.S. Household Mental Health & Covid-19

Assessing the Impact of the Pandemic

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Data Use in Academia Dataset

Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) |...

Forecast revenue big data market worldwide 2011-2027

Spending on research and development of each country and “studies for money”...

United States US: Government Intramural Expenditure on R&D (GOVERD)

Data from: Medical Expenditure Panel Survey

Predicting Credit Card Customer Segmentation

Predicting Credit Card Customer Segmentation

Exploring Key Customer Characteristics

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Campaign Finance Database

Medical Expenditure Panel Survey (MEPS) Restricted Data Files

The Correlates of State Policy Project

Context

Content

Acknowledgements

Current Population Survey (CPS)

Manufacturing Cost Guide

Data from: What We Eat In America (WWEIA) Database

Medical Expenditure Panel Survey (MEPS) Insurance Component Data Tools

Data from: Research funding for male reproductive health and infertility in...

GERD by sector of performanceSee More Versions

GERD by sector of performance