https://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQUhttps://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQU
The All of Us Research Hub contains a wide variety of datatypes, including survey responses, measurements, biosamples, electronic health records (EHRs), and data from mobile health devices from participants who are healthy as well as experiencing illness. The Registered Tier curated dataset contains individual-level data, available only to approved researchers on the Researcher Workbench. The Registered Tier currently includes data from electronic health records, survey answers, and physical measurements taken at the time of participant enrollment. Only authorized users who have registered with the All of Us Research Program can access the Registered Tier data. Authorized users also can access tools such as the Cohort Builder, Jupyter Notebooks, and Dataset Builder.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global clinical genomic data analysis market size was valued at USD 1.5 billion in 2023 and is projected to reach USD 6.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 17.2% during the forecast period. This market growth is driven by the increasing adoption of genomic sequencing technologies, advancements in bioinformatics, and the rising prevalence of chronic diseases that necessitate personalized medicine and targeted therapies.
A major growth factor for the clinical genomic data analysis market is the exponential increase in the volume of genomic data being generated. With the cost of sequencing dropping and the speed of sequencing increasing, more genomic data is being produced than ever before. This abundance of data requires sophisticated analysis tools and software to interpret and derive meaningful insights, driving the demand for advanced genomic data analysis solutions. Additionally, the integration of artificial intelligence and machine learning algorithms in genomics is further enhancing the capabilities of these analysis tools, enabling more accurate and faster data interpretation.
Another significant factor contributing to market growth is the rising incidence of genetic disorders and cancers, which necessitates comprehensive genomic analysis for accurate diagnosis and personalized treatment plans. Personalized medicine, which tailors medical treatment to the individual characteristics of each patient, relies heavily on the insights gained from genomic data analysis. As the understanding of the genetic basis of diseases deepens, the demand for clinical genomic data analysis is expected to surge, further propelling market growth.
The integration of NGS Informatics and Clinical Genomics is revolutionizing the field of personalized medicine. By leveraging next-generation sequencing (NGS) technologies, researchers and clinicians can now analyze vast amounts of genomic data with unprecedented speed and accuracy. This integration enables the identification of genetic variants that may contribute to disease, allowing for more precise diagnosis and the development of targeted therapies. As the capabilities of NGS technologies continue to expand, the role of informatics in managing and interpreting this data becomes increasingly critical. The seamless integration of NGS Informatics and Clinical Genomics is paving the way for more effective and personalized healthcare solutions, ultimately improving patient outcomes.
Government initiatives and funding in genomics research also play a crucial role in the expansion of the clinical genomic data analysis market. Many governments around the world are investing heavily in genomic research projects and infrastructure to advance medical research and improve public health outcomes. For instance, initiatives like the 100,000 Genomes Project in the UK and the All of Us Research Program in the US underscore the importance of genomics in understanding human health and disease, thereby boosting the demand for genomic data analysis tools and services.
Regional outlook reveals significant growth opportunities in emerging markets, particularly in the Asia Pacific region. Countries like China, India, and Japan are witnessing rapid advancements in healthcare infrastructure and increasing investments in genomics research. Additionally, favorable government policies and the presence of a large patient pool make this region a lucrative market for clinical genomic data analysis. North America continues to dominate the market due to high healthcare spending, advanced research facilities, and the early adoption of new technologies. Europe also shows steady growth with significant contributions from countries like the UK, Germany, and France.
The component segment of the clinical genomic data analysis market is divided into software and services. The software segment encompasses various bioinformatics tools and platforms used for genomic data analysis. These tools are essential for the effective management, storage, and interpretation of the massive amounts of genomic data generated. The growing complexity of genomic data necessitates the use of robust software solutions that can handle large datasets and provide accurate insights. As a result, the software segment is expected to witness significant growth during the forecast period.
The services segment includes
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are many initiatives attempting to harmonize data collection across human clinical studies using common data elements (CDEs). The increased use of CDEs in large prior studies can guide researchers planning new studies. For that purpose, we analyzed the All of Us (AoU) program, an ongoing US study intending to enroll one million participants and serve as a platform for numerous observational analyses. AoU adopted the OMOP Common Data Model to standardize both research (Case Report Form [CRF]) and real-world (imported from Electronic Health Records [EHRs]) data. AoU standardized specific data elements and values by including CDEs from terminologies such as LOINC and SNOMED CT. For this study, we defined all elements from established terminologies as CDEs and all custom concepts created in the Participant Provided Information (PPI) terminology as unique data elements (UDEs). We found 1 033 research elements, 4 592 element-value combinations and 932 distinct values. Most elements were UDEs (869, 84.1%), while most CDEs were from LOINC (103 elements, 10.0%) or SNOMED CT (60, 5.8%). Of the LOINC CDEs, 87 (53.1% of 164 CDEs) originated from previous data collection initiatives, such as PhenX (17 CDEs) and PROMIS (15 CDEs). On a CRF level, The Basics (12 of 21 elements, 57.1%) and Lifestyle (10 of 14, 71.4%) were the only CRFs with multiple CDEs. On a value level, 61.7% of distinct values are from an established terminology. AoU demonstrates the use of the OMOP model for integrating research and routine healthcare data (64 elements in both contexts), which allows for monitoring lifestyle and health changes outside the research setting. The increased inclusion of CDEs in large studies (like AoU) is important in facilitating the use of existing tools and improving the ease of understanding and analyzing the data collected, which is more challenging when using study specific formats.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Clinical trial data management (CDM) providers have experienced robust growth in recent years, driven by several key factors. Two major catalysts contributing to this growth are an increasing demand for innovative therapies and treatments and the rising prevalence of chronic diseases worldwide. As pharmaceutical companies race to develop new drugs and biologics to address unmet medical needs, the volume and complexity of clinical trials have surged. A jump in clinical trial activity has fueled the need for efficient and reliable data management solutions to handle the vast amounts of data generated throughout the drug development process. At the same time, regulatory bodies in the US and internationally mounting scrutiny of clinical trial data integrity has prompted pharmaceutical companies to outsource data management to compliance and transparency. In all, revenue has been expanding at a CAGR of 5.9% to an estimated $8.9 billion over the past five years, including expected growth of 2.7% in 2024. One central trend behind clinical trial data management providers’ growth is the increasingly complex clinical trial landscape. Medical and tech advances have made the clinical trial process more intricate, expanding the volume and variety of data collected during clinical trials, introducing significant challenges for data management. Clinical trial data management companies have developed an increasingly vital role in addressing these challenges by providing specialized services. Outsourcing data management has been especially crucial for smaller biopharmaceutical companies that depend heavily on successful clinical trials but lack the capital or resources to invest in in-house capabilities. Outsourcing aspects of the research and development stage, including clinical trial data management, will become an increasingly attractive option for downstream pharmaceutical and medical device manufacturers, positioning the industry for growth. Competition between smaller or mid-sized pharma and the leading multinational manufacturers to bring novel therapies to market will strengthen CDM companies’ role. An approaching patent cliff will also drive demand for clinical trial data management services as revenue declines and heightened competition from generic drugs accelerate clinical trial activity and cost mitigation efforts. Revenue will continue growing, rising at a CAGR of 3.3% over the next five years, reaching an estimated $10.5 billion in 2029.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
US: Research and Development Expenditure: % of GDP data was reported at 2.794 % in 2015. This records an increase from the previous number of 2.755 % for 2014. US: Research and Development Expenditure: % of GDP data is updated yearly, averaging 2.620 % from Dec 1996 (Median) to 2015, with 20 observations. The data reached an all-time high of 2.822 % in 2009 and a record low of 2.442 % in 1996. US: Research and Development Expenditure: % of GDP data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Technology. Gloss domestic expenditures on research and development (R&D), expressed as a percent of GDP. They include both capital and current expenditures in the four main sectors: Business enterprise, Government, Higher education and Private non-profit. R&D covers basic research, applied research, and experimental development.; ; UNESCO Institute for Statistics; Weighted average; Each economy is classified based on the classification of World Bank Group's fiscal year 2018 (July 1, 2017-June 30, 2018).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Values correspond to mean (standard deviation) or N (%), All of Us Research Program, 2017–2019.
The CIRRIE Database of International Rehabilitation Research currently contains almost 90,000 citations of international rehabilitation research published between 1990 and the present. The CIRRIE Database collects citations from all areas of rehabilitation and compiles them into one central source. In addition to indexing from mainstream journals and internet sites, CIRRIE also includes citations to resources not readily available to U.S. researchers. Over 5000 journals are represented in the Database and abstracts are available for most citations. A list of journals is now available. There are almost 90,000 citations currently indexed with citations added monthly. The database includes citations from 1990 to the present. It was created to facilitate the sharing of information and expertise in rehabilitation and disability research between the U.S. and other countries.
Expert industry market research on the Database, Storage & Backup Software Publishing in the US (2005-2031). Make better business decisions, faster with IBISWorld's industry market research reports, statistics, analysis, data, trends and forecasts.
The RVCAT database contains data that have been collected on various vessel operations on the Great Lakes and select connecting waterways. This section of Reference Tables specifically handles repetitive or standardized information that is called upon in the main tables of the RVCAT database. Reference tables are used in database design in order to standardize often used values and to make the data file efficient. All of the terms defined in the reference tables have been determined by the United States Geological Survey, Great Lakes Science Center and it’s partners. Data Quality: Note that the following data release is a snapshot of the database at the time of release. Some data quality checks are still being undertaken after the time of release. Also, a large section of this database includes legacy data that if issues arise for cannot be addressed, but nevertheless adds great value to the database. When approaching the following data release, it is strongly suggested to approach the Great Lakes Science Center's researchers for input. Distribution Liability Statement: Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.
This is a collection of dataset that I personally think it is useful in analysing COVID19 data. Since all of the data comes from the internet and majority of them originated from World Bank, I am use some Kaggle users has already uploaded similar data. However, I think it makes my life (and perhaps yours) easier by compiling all of these data together.
The following are some remarks for the dataset-
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes bibliographic information for 501 papers that were published from 2010-April 2017 (time of search) and use online biodiversity databases for research purposes. Our overarching goal in this study is to determine how research uses of biodiversity data developed during a time of unprecedented growth of online data resources. We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Specifically, we address the following questions:
1.) What primary biodiversity databases have been cited in published research, and which
databases have been cited most often?
2.) Is the biodiversity research community citing databases appropriately, and are
the cited databases currently accessible online?
3.) What are the most common uses, general taxa addressed, and data linkages, and how
have they changed over time?
4.) What uses have the highest impact, as measured through the mean number of citations
per year?
5.) Are certain uses applied more often for plants/invertebrates/vertebrates?
6.) Are links to specific data types associated more often with particular uses?
7.) How often are major data quality issues addressed?
8.) What data quality issues tend to be addressed for the top uses?
Relevant papers for this analysis include those that use online and openly accessible primary occurrence records, or those that add data to an online database. Google Scholar (GS) provides full-text indexing, which was important to identify data sources that often appear buried in the methods section of a paper. Our search was therefore restricted to GS. All authors discussed and agreed upon representative search terms, which were relatively broad to capture a variety of databases hosting primary occurrence records. The terms included: “species occurrence” database (8,800 results), “natural history collection” database (634 results), herbarium database (16,500 results), “biodiversity database” (3,350 results), “primary biodiversity data” database (483 results), “museum collection” database (4,480 results), “digital accessible information” database (10 results), and “digital accessible knowledge” database (52 results)--note that quotations are used as part of the search terms where specific phrases are needed in whole. We downloaded all records returned by each search (or the first 500 if there were more) into a Zotero reference management database. About one third of the 2500 papers in the final dataset were relevant. Three of the authors with specialized knowledge of the field characterized relevant papers using a standardized tagging protocol based on a series of key topics of interest. We developed a list of potential tags and descriptions for each topic, including: database(s) used, database accessibility, scale of study, region of study, taxa addressed, research use of data, other data types linked to species occurrence data, data quality issues addressed, authors, institutions, and funding sources. Each tagged paper was thoroughly checked by a second tagger.
The final dataset of tagged papers allow us to quantify general areas of research made possible by the expansion of online species occurrence databases, and trends over time. Analyses of this data will be published in a separate quantitative review.
== Quick facts ==
The most up-to-date and comprehensive podcast database available All languages & All countries Includes over 3,500,000 podcasts Features 35+ data fields , such as basic metadata, global rank, RSS feed (with audio URLs), Spotify links, and more Delivered in SQLite format Learn how we build a high quality podcast database: https://www.listennotes.help/article/105-high-quality-podcast-database-from-listen-notes
== Use Cases ==
AI training, including speech recognition, generative AI, voice cloning / synthesis, and news analysis Alternative data for investment research, such as sentiment analysis of executive interviews, market research and tracking investment themes PR and marketing, including social monitoring, content research, outreach, and guest booking ...
== Data Attributes ==
See the full list of data attributes on this page: https://www.listennotes.com/podcast-datasets/fields/?filter=podcast_only
How to access podcast audio files: Our dataset includes RSS feed URLs for all podcasts. You can retrieve audio for over 170 million episodes directly from these feeds. With access to the raw audio, you’ll have high-quality podcast speech data ideal for AI training and related applications.
== Custom Offers ==
We can provide custom datasets based on your needs, such as language-specific data, daily/weekly/monthly update frequency, or one-time purchases.
We also provide a RESTful API at PodcastAPI.com
Contact us: hello@listennotes.com
== Need Help? ==
If you have any questions about our products, feel free to reach out hello@listennotes.com
== About Listen Notes, Inc. ==
Since 2017, Listen Notes, Inc. has provided the leading podcast search engine and podcast database.
All gillnet data represented here expand upon vessel operations (OP table) data, all of which are collected by the United States Geological Survey, Great Lakes Science Center and its partners. The Gillnet Tables contain data collected from the research vessel deploying various gear used for gillnet data collection. The database uses sample_type to indicate the gear deployed. The tables relating to Gillnet are: GN_Annulus.csv, GN_Catch.csv, GN_Effort.csv, GN_Fish.csv, GN_L, GN_LF.csv, GN_OP.csv, GN_Prey.csv, GN_Stomach.csv, LMMB_Fish_Prey.csv, and LMMB_Invert_Prey.csv Data Quality: Note that the following data release is a snapshot of the database at the time of release. Some data quality checks are still being undertaken after the time of release. Also, a large section of this database includes legacy data that if issues arrise for cannot be addressed, but nevertheless adds great value to the database. When approaching the following data release, it is strongly suggested to approach the Great Lakes Science Center's researchers for input. Distribution Liability Statement: Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Height of Land township population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Height of Land township. The dataset can be utilized to understand the population distribution of Height of Land township by age. For example, using this dataset, we can identify the largest age group in Height of Land township.
Key observations
The largest age group in Height of Land Township, Minnesota was for the group of age 55 to 59 years years with a population of 116 (15.93%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Height of Land Township, Minnesota was the 85 years and over years with a population of 10 (1.37%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Height of Land township Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of University Park by race. It includes the population of University Park across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of University Park across relevant racial categories.
Key observations
The percent distribution of University Park population by race (across all racial categories recognized by the U.S. Census Bureau): 9.99% are white, 84.77% are Black or African American, 0.08% are Asian, 1.29% are some other race and 3.87% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for University Park Population by Race & Ethnicity. You can refer the same here
Expert industry market research on the Financial Data Service Providers in the US (2005-2030). Make better business decisions, faster with IBISWorld's industry market research reports, statistics, analysis, data, trends and forecasts.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Number of Businesses statistics on the Database & Directory Publishing industry in United States
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: HERD: Current PPP data was reported at 91.451 USD bn in 2022. This records an increase from the previous number of 85.787 USD bn for 2021. United States US: HERD: Current PPP data is updated yearly, averaging 34.717 USD bn from Dec 1981 (Median) to 2022, with 42 observations. The data reached an all-time high of 91.451 USD bn in 2022 and a record low of 7.085 USD bn in 1981. United States US: HERD: Current PPP data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s United States – Table US.OECD.MSTI: Gross Domestic Expenditure on Research and Development: OECD Member: Annual.
For the United States, from 2021 onwards, changes to the US BERD survey questionnaire allowed for more exhaustive identification of acquisition costs for ‘identifiable intangible assets’ used for R&D. This has resulted in a substantial increase in reported R&D capital expenditure within BERD. In the business sector, the funds from the rest of the world previously included in the business-financed BERD, are available separately from 2008. From 2006 onwards, GOVERD includes state government intramural performance (most of which being financed by the federal government and state government own funds). From 2016 onwards, PNPERD data are based on a new R&D performer survey. In the higher education sector all fields of SSH are included from 2003 onwards.
Following a survey of federally-funded research and development centers (FFRDCs) in 2005, it was concluded that FFRDC R&D belongs in the government sector - rather than the sector of the FFRDC administrator, as had been reported in the past. R&D expenditures by FFRDCs were reclassified from the other three R&D performing sectors to the Government sector; previously published data were revised accordingly. Between 2003 and 2004, the method used to classify data by industry has been revised. This particularly affects the ISIC category “wholesale trade” and consequently the BERD for total services.
U.S. R&D data are generally comparable, but there are some areas of underestimation:
Breakdown by type of R&D (basic research, applied research, etc.) was also revised back to 1998 in the business enterprise and higher education sectors due to improved estimation procedures.
The methodology for estimating researchers was changed as of 1985. In the Government, Higher Education and PNP sectors the data since then refer to employed doctoral scientists and engineers who report their primary work activity as research, development or the management of R&D, plus, for the Higher Education sector, the number of full-time equivalent graduate students with research assistantships averaging an estimated 50 % of their time engaged in R&D activities. As of 1985 researchers in the Government sector exclude military personnel. As of 1987, Higher education R&D personnel also include those who report their primary work activity as design.
Due to lack of official data for the different employment sectors, the total researchers figure is an OECD estimate up to 2019. Comprehensive reporting of R&D personnel statistics by the United States has resumed with records available since 2020, reflecting the addition of official figures for the number of researchers and total R&D personnel for the higher education sector and the Private non-profit sector; as well as the number of researchers for the government sector. The new data revise downwards previous OECD estimates as the OECD extrapolation methods drawing on historical US data, required to produce a consistent OECD aggregate, appear to have previously overestimated the growth in the number of researchers in the higher education sector.
Pre-production development is excluded from Defence GBARD (in accordance with the Frascati Manual) as of 2000. 2009 GBARD data also includes the one time incremental R&D funding legislated in the American Recovery and Reinvestment Act of 2009. Beginning with the 2000 GBARD data, budgets for capital expenditure – “R&D plant” in national terminology - are included. GBARD data for earlier years relate to budgets for current costs only.
Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).
https://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQUhttps://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQU
The All of Us Research Hub contains a wide variety of datatypes, including survey responses, measurements, biosamples, electronic health records (EHRs), and data from mobile health devices from participants who are healthy as well as experiencing illness. The Registered Tier curated dataset contains individual-level data, available only to approved researchers on the Researcher Workbench. The Registered Tier currently includes data from electronic health records, survey answers, and physical measurements taken at the time of participant enrollment. Only authorized users who have registered with the All of Us Research Program can access the Registered Tier data. Authorized users also can access tools such as the Cohort Builder, Jupyter Notebooks, and Dataset Builder.