100+ datasets found
  1. n

    Data from: Data reuse and the open data citation advantage

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Oct 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2013
    Dataset provided by
    National Evolutionary Synthesis Center
    Authors
    Heather A. Piwowar; Todd J. Vision
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

  2. f

    Data_Sheet_1_Advanced large language models and visualization tools for data...

    • frontiersin.figshare.com
    txt
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

  3. u

    Importance of reasons for bringing engineering and research and development...

    • data.urbandatacentre.ca
    • datasets.ai
    • +3more
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Importance of reasons for bringing engineering and research and development (R&D) services activities to Canada, by industry and enterprise size [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-62b78642-89dc-4cf8-a26a-216daf2bf573
    Explore at:
    Dataset updated
    Oct 1, 2024
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Percentage of enterprises for which specific reasons for bringing engineering and research and development (R&D) services activities to Canada were not at all important, somewhat important, important or very important, by North American Industry Classification System (NAICS) code and enterprise size, based on a three-year observation period. Reasons for bringing business activities to Canada include cost savings from locating abroad did not materialize (lower operating costs), labour costs abroad have risen (lower labour costs in Canada), better quality of labour or resources in Canada, lower Canadian dollar, consolidating number of suppliers, tax or other financial incentives, concerns about intellectual property, proximity to customers or other logistical issues, and other reasons related to engineering and research and development (R&D) services.

  4. I

    Cline Center Coup d’État Project Dataset

    • databank.illinois.edu
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
    Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
    Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7

  5. d

    Life Cycle inventory database - Dataset - CE data hub

    • datahub.digicirc.eu
    Updated May 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Life Cycle inventory database - Dataset - CE data hub [Dataset]. https://datahub.digicirc.eu/dataset/life-cycle-inventory-database
    Explore at:
    Dataset updated
    May 10, 2022
    Description

    (i) The CPM LCA Database is developed within the Swedish Life Cycle Center, and is a result of the continuous work to establish transparent and quality reviewed LCA data. The Swedish Life Cycle Center (founded in 1996 and formerly called CPM) is a center of excellence for the advance of life cycle thinking in industry and other parts of society through research, implementation, communication and exchange of experience on life cycle management. The mission is to improve the environmental performance of products and services, as a natural part of sustainable development. The Center has been instrumental for the development and adoption the life cycle perspective in Swedish companies and has made important contributions to international standardization in the life cycle field. More information about the Center, see www.lifecyclecenter.se. The Swedish Life Cycle Center owns the CPM LCA Database, which is today maintained by Environmental Systems Analysis at the Department of Energy and Environment at Chalmers University of Technology. (ii) All LCI datasets can be viewed in in three formats: the SPINE format, a format compatible with the ISO/TS 14048 LCA data documentation format criteria, and in the ILCD format. Three impact assessment models: EPS, EDIP, and ECO-Indicator, can be viewed in the IA98 format. Also a simple IA calculator is provided where the environmental impact of each LCI dataset can be calculated based on the three different IA methods. (iii) unknown (iv) unknown

  6. o

    Michigan Public Policy Survey Public Use Datasets

    • openicpsr.org
    delimited, spss +1
    Updated Aug 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Local, State, and Urban Policy (2016). Michigan Public Policy Survey Public Use Datasets [Dataset]. http://doi.org/10.3886/E100132V30
    Explore at:
    delimited, spss, stataAvailable download formats
    Dataset updated
    Aug 19, 2016
    Dataset authored and provided by
    Center for Local, State, and Urban Policy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Michigan
    Description

    The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. Out of a commitment to promoting public knowledge of Michigan local governance, the Center for Local, State, and Urban Policy is releasing public use datasets. In order to protect respondent confidentiality, CLOSUP has divided the data collected in each wave of the survey into separate datasets focused on different topics that were covered in the survey. Each dataset contains only variables relevant to that subject, and the datasets cannot be linked together. Variables have also been omitted or recoded to further protect respondent confidentiality. For researchers looking for a more extensive release of the MPPS data, restricted datasets are available through openICPSR's Virtual Data Enclave. Please note: additional waves of MPPS public use datasets are being prepared, and will be available as part of this project as soon as they are completed. For information on accessing MPPS public use and restricted datasets, please visit the MPPS data access page: http://closup.umich.edu/mpps-download-datasets

  7. w

    World Bank Country Survey 2012 - China

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Mar 14, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public Opinion Research Group (2014). World Bank Country Survey 2012 - China [Dataset]. https://microdata.worldbank.org/index.php/catalog/1856
    Explore at:
    Dataset updated
    Mar 14, 2014
    Dataset authored and provided by
    Public Opinion Research Group
    Time period covered
    2011 - 2012
    Area covered
    China
    Description

    Abstract

    The World Bank is interested in gauging the views of clients and partners who are either involved in development in China or who observe activities related to social and economic development. The World Bank Country Assessment Survey is meant to give the Bank's team that works in China, more in-depth insight into how the Bank's work is perceived. This is one tool the Bank uses to assess the views of its critical stakeholders. With this understanding, the World Bank hopes to develop more effective strategies, outreach and programs that support development in China. The World Bank commissioned an independent firm to oversee the logistics of this effort in China.

    The survey was designed to achieve the following objectives: - Assist the World Bank in gaining a better understanding of how stakeholders in China perceive the Bank; - Obtain systematic feedback from stakeholders in China regarding: · Their views regarding the general environment in China; · Their perceived overall value of the World Bank in China; · Overall impressions of the World Bank as related to programs, poverty reduction, personal relationships, effectiveness, knowledge base, collaboration, and its day-to-day operation; and · Perceptions of the World Bank's communication and outreach in China. - Use data to help inform the China country team's strategy.

    Geographic coverage

    National

    Analysis unit

    Stakeholder

    Universe

    Stakeholders of the World Bank in China

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    December 2011 thru March 2012, 518 stakeholders of the World Bank in China were invited to provide their opinions on the Bank's assistance to the country by participating in a country survey. Participants in the survey were drawn from among employees of a ministry or ministerial department of central government; local government officials or staff; project management offices at the central and local level; the central bank; financial sector/banks; NGOs; regulatory agencies; state-owned enterprises; bilateral or multilateral agencies; private sector organizations; consultants/contractors working on World Bank supported projects/programs; the media; and academia, research institutes or think tanks.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The Questionnaire consists of 8 Sections: 1. Background Information: The first section asked respondents for their current position; specialization; familiarity, exposure to, and involvement with the Bank; and geographic location.

    1. General Issues facing China: Respondents were asked to indicate what they thought were the most important development priorities, which areas would contribute most to poverty reduction and economic growth in China, as well as rating their perspective on the future of the next generation in China.

    2. Overall Attitudes toward the World Bank: Respondents were asked to rate the Bank's overall effectiveness in China, the extent to which the Bank's financial instruments meet China's needs, the extent to which the Bank meets China's need for knowledge services, and their agreement with various statements regarding the Bank's programs, poverty mission, relationships, and collaborations in China. Respondents were also asked to indicate the areas on which it would be most productive for the Bank to focus its resources and research, what the Bank's level of involvement should be, and what they felt were the Bank's greatest values and greatest weaknesses in its work.

    3. The Work of the World Bank: Respondents were asked to rate their level of importance and the Bank's level of effectiveness across fifteen areas in which the Bank was involved, such as helping to reduce poverty and encouraging greater transparency in governance.

    4. The Way the World Bank does Business: Respondents were asked to rate the Bank's level of effectiveness in the way it does business, including the Bank's knowledge, personal relationships, collaborations, and poverty mission.

    5. Project/Program Related Issues: Respondents were asked to rate their level of agreement with a series of statements regarding the Bank's programs, day-to-day operations, and collaborations in China.

    6. The Future of the World Bank in China: Respondents were asked to rate how significant a role the Bank should play in China's development and to indicate what the Bank could do to make itself of greater value and what the greatest obstacle was to the Bank playing a significant role in China.

    7. Communication and Outreach: Respondents were asked to indicate where they get information about development issues and the Bank's development activities in China, as well as how they prefer to receive information from the Bank. Respondents were also asked to indicate their usage of the Bank's website and PICs, and to evaluate these communication and outreach efforts.

    Response rate

    A total of 207 stakeholders participated in the country survey (40%).

  8. Data from: MODIRISK: Monitoring of Mosquito Vectors of Disease (inventory)

    • gbif.org
    • data.europa.eu
    Updated May 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens; Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens (2022). MODIRISK: Monitoring of Mosquito Vectors of Disease (inventory) [Dataset]. http://doi.org/10.15468/4fidg2
    Explore at:
    Dataset updated
    May 2, 2022
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Belgian Biodiversity Platform
    Authors
    Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens; Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2007 - Dec 31, 2011
    Area covered
    Description

    MODIRISK aims at studying biodiversity of mosquitoes and monitoring/predicting its changes, and hence actively prepares to address issues on the impact of biodiversity change with particular reference to invasive species and the risk to introduce new pathogens. This is essential in the perspective of the ongoing global changes creating suitable conditions for the spread of invasive species and the (re)emergence of vector-borne diseases in Europe. The main strengths of the project in the context of sustainable development are the link between biodiversity and health-environment, and its contribution to the development of tools to better describe the spatial distribution of mosquito biodiversity. MODIRISK addresses key topics of the global initiative Diversitas, which was one of the main drivers of the 'Research programme Science for a Sustainable Development' (SSD). This dataset contains the monitoring data.

    The project was coordinated by the Institute of Tropical Medicine (http://www.itg.be/E) in Antwerp.

  9. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://datacatalog.ihsn.org/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Economic Research Forum
    Kurdistan Regional Statistics Office (KRSO)
    Central Statistical Organization (CSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  10. f

    Data from: S8 Fig -

    • plos.figshare.com
    zip
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc (2023). S8 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0289211.s009
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images—a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.

  11. Expenditure and Consumption Survey, 2004 - West Bank and Gaza

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    • +1more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 2004 - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/index.php/catalog/3085
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Time period covered
    2004 - 2005
    Area covered
    Palestine, West Bank
    Description

    Abstract

    The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

    The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

    Geographic coverage

    The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.

    Analysis unit

    1- Household/families. 2- Individuals.

    Universe

    The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sample and Frame:

    The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.

    Sample Design:

    The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.

    Sample strata:

    The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)

    Sample Size:

    The calculated sample size is 3,781 households.

    Target cluster size:

    The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.

    Detailed information/formulas on the sampling design are available in the user manual.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The PECS questionnaire consists of two main sections:

    First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:

    Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.

    Statement of the family members: Contains social, economic and demographic particulars of the selected family.

    Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).

    Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.

    Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.

    Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.

    Cleaning operations

    Raw Data

    Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.

    Harmonized Data

    • The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.
    • The harmonization process starts with cleaning all raw data files received from the Statistical Office.
    • Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.
    • A post-harmonization cleaning process is run on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

    Response rate

    The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.

    Sampling error estimates

    The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.

  12. w

    Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 30, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Agency for Statistics (BHAS) (2020). Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina [Dataset]. https://microdata.worldbank.org/index.php/catalog/67
    Explore at:
    Dataset updated
    Jan 30, 2020
    Dataset provided by
    Federation of BiH Institute of Statistics (FIS)
    State Agency for Statistics (BHAS)
    Republika Srpska Institute of Statistics (RSIS)
    Time period covered
    2003
    Area covered
    Bosnia and Herzegovina
    Description

    Abstract

    In 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS). The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:

    1. To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.

    2. To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.

    3. To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.

    The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further two years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS). Birks Sinclair & Associates Ltd. were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK. The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for two years following the LSMS, in the autumn of 2002 and 2003. The LSMS constitutes Wave 1 of the panel survey so there are three years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel
    - Wave 2 Second interview of 50% of LSMS respondents in Autumn/ Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/ Winter 2003

    The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observe the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty is experienced by different types of households and individuals over the three year period. And most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within FBiH and RS at a time of social reform and rapid change.

    Geographic coverage

    National coverage. Domains: Urban/rural/mixed; Federation; Republic

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Wave 3 sample consisted of 2878 households who had been interviewed at Wave 2 and a further 73 households who were interviewed at Wave 1 but were non-contact at Wave 2 were issued. A total of 2951 households (1301 in the RS and 1650 in FBiH) were issued for Wave 3. As at Wave 2, the sample could not be replaced with any other households.

    Panel design

    Eligibility for inclusion

    The household and household membership definitions are the same standard definitions as a Wave 2. While the sample membership status and eligibility for interview are as follows: i) All members of households interviewed at Wave 2 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.

    Following rules

    The panel design means that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in others an individual member may move away from their previous wave household and form a new split-off household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefit of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.

    Definition of 'out-of-scope'

    It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are as follows:

    i. Movers out of the country altogether i.e. outside FBiH and RS. This category of mover is clear. Sample members moving to another country outside FBiH and RS will be out-of-scope for that year of the survey and not eligible for interview.

    ii. Movers between entities Respondents moving between entities are followed for interview. The personal details of the respondent are passed between the statistical institutes and a new interviewer assigned in that entity.

    iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 3 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.

    iv. Movers into the district of Brcko are followed for interview. When coding entity Brcko is treated as the entity from which the household who moved into Brcko originated.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Questionnaire design

    Approximately 90% of the questionnaire (Annex B) is based on the Wave 2 questionnaire, carrying forward core measures that are needed to measure change over time. The questionnaire was widely circulated and changes were made as a result of comments received.

    Pretesting

    In order to undertake a longitudinal test the Wave 2 pretest sample was used. The Control Forms and Advance letters were generated from an Access database containing details of ten households in Sarajevo and fourteen in Banja Luka. The pretest was undertaken from March 24-April 4 and resulted in 24 households (51 individuals) successfully interviewed. One mover household was successfully traced and interviewed.
    In order to test the questionnaire under the hardest circumstances a briefing was not held. A list of the main questionnaire changes was given to experienced interviewers.

    Issues arising from the pretest

    Interviewers were asked to complete a Debriefing and Rating form. The debriefing form captured opinions on the following three issues:

    1. General reaction to being re-interviewed. In some cases there was a wariness of being asked to participate again, some individuals asking “Why Me?” Interviewers did a good job of persuading people to take part, only one household refused and another asked to be removed from the sample next year. Having the same interviewer return to the same households was considered an advantage. Most respondents asked what was the benefit to them of taking part in the survey. This aspect was reemphasised in the Advance Letter, Respondent Report and training of the Wave 3 interviewers.

    2. Length of the questionnaire. The average time of interview was 30 minutes. No problems were mentioned in relation to the timing, though interviewers noted that some respondents, particularly the elderly, tended to wonder off the point and that control was needed to bring them back to the questions in the questionnaire. One interviewer noted that the economic situation of many respondents seems to have got worse from the previous year and it was necessary to listen to respondents “stories” during the interview.

    3. Confidentiality. No problems were mentioned in relation to confidentiality. Though interviewers mentioned it might be worth mentioning the new Statistics Law in the Advance letter. The Rating Form asked for details of specific questions that were unclear. These are described below with a description of the changes made.

    • Module 3. Q29-31 have been added to capture funds received for education, scholarships etc.

    • Module 4. Pretest respondents complained that the 6 questions on "Has your health limited you..." and the 16 on "in the last 7 days have you felt depressed” etc were too many. These were reduced by half (Q38-Q48). The LSMS data was examined and those questions where variability between the answers was widest were chosen.

    • Module 5. The new employment questions (Q42-Q44) worked well and have been kept in the main questionnaire.

    • Module 7. There were no problems reported with adding the credit questions (Q28-Q36)

    • Module 9. SIG recommended that some of Questions 1-12 were relevant only to those aged over 18 so additional skips have been added. Some respondents complained the questionnaire was boring. To try and overcome

  13. d

    Coho Distribution [ds326]

    • catalog.data.gov
    • data.ca.gov
    • +4more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2024). Coho Distribution [ds326] [Dataset]. https://catalog.data.gov/dataset/coho-distribution-ds326-cc8ae
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Department of Fish and Wildlife
    Description

    November 2022 VersionThis dataset represents the "Observed Distribution" for coho salmon in California by using observations made only between 1990 and the present. It was developed for the express purpose of assisting with species recovery planning efforts. The process for developing this dataset was to collect as many observations of the species as possible and derive the stream-based geographic distribution for the species based solely on these positive observations.For the purpose of this dataset an observation is defined as a report of a sighting or other evidence of the presence of the species at a given place and time. As such, observations are modeled by year observed as point locations in the GIS. All such observations were collected with information regarding who reported the observation, their agency/organization/affiliation, the date that they observed the species, who compiled the information, etc. This information is maintained in the developers file geodatabase (©Environmental Science Research Institute (ESRI) 2016).To develop this distribution dataset, the species observations were applied to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography. For each observation, a path was traced down the hydrography from the point of observation to the ocean, thereby deriving the shortest migration route from the point of observation to the sea. By appending all of these migration paths together, the "Observed Distribution" for the species is developed.It is important to note that this layer does not attempt to model the entire possible distribution of the species. Rather, it only represents the known distribution based on where the species has been observed and reported. While some observations indeed represent the upstream extent of the species (e.g., an observation made at a hard barrier), the majority of observations only indicate where the species was sampled for or otherwise observed. Because of this, this dataset likely underestimates the absolute geographic distribution of the species.It is also important to note that the species may not be found on an annual basis in all indicated reaches due to natural variations in run size, water conditions, and other environmental factors. As such, the information in this dataset should not be used to verify that the species are currently present in a given stream. Conversely, the absence of distribution linework for a given stream does not necessarily indicate that the species does not occur in that stream. The observation data were compiled from a variety of disparate sources including but not limited to CDFW, USFS, NMFS, timber companies, and the public. Forms of documentation include CDFW administrative reports, personal communications with biologists, observation reports, and literature reviews. The source of each feature (to the best available knowledge) is included in the data attributes for the observations in the geodatabase, but not for the resulting linework. The spatial data has been referenced to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography.Usage of this dataset:Examples of appropriate uses include:- species recovery planning- Evaluation of future survey sites for the species- Validating species distribution modelsExamples of inappropriate uses include:- Assuming absence of a line feature means that the species are not present in that stream.- Using this data to make parcel or ground level land use management decisions.- Using this dataset to prove or support non-existence of the species at any spatial scale.- Assuming that the line feature represents the maximum possible extent of species distribution.All users of this data should seek the assistance of qualified professionals such as surveyors, hydrologists, or fishery biologists as needed to ensure that such users possess complete, precise, and up to date information on species distribution and water body location.Any copy of this dataset is considered to be a snapshot of the species distribution at the time of release. It is impingent upon the user to ensure that they have the most recent version prior to making management or planning decisions.Please refer to "Use Constraints" section below.

  14. United States US Forest Service Surface Drinking Water Importance

    • koordinates.com
    csv, dwg, geodatabase +6
    Updated Sep 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Forestry Service (2022). United States US Forest Service Surface Drinking Water Importance [Dataset]. https://koordinates.com/layer/110480-united-states-us-forest-service-surface-drinking-water-importance/
    Explore at:
    shapefile, geodatabase, kml, mapinfo mif, mapinfo tab, pdf, geopackage / sqlite, dwg, csvAvailable download formats
    Dataset updated
    Sep 19, 2022
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Authors
    US Forestry Service
    Area covered
    Description

    Surface Drinking Water Importance Index - National Extent

    _**Abstract: The Forests to Faucets dataset provides a watershed index of surface drinking water importance, a watershed index of forest importance to surface drinking water, and a watershed index to highlight the extent to which development, fire, and insects and disease threaten forests important for surface drinking water. The Forests to Faucets layer does not cover Alaska, Hawaii, or US Territories. This dataset was created using the 2001 National Landcover Dataset and 2005 housing development estimates. For updated forest and development statistics, please refer to the 2015 Forests on the Edge dataset.Purpose: **_The results of the Forests to Faucets assessment provides information that can identify areas of interest for protecting surface drinking water quality. The spatial dataset can be incorporated into broad-scale planning, such as the State Forest Action Plans, and can be incorporated into existing decision support tools that currently lack spatial data on important areas for surface drinking water. This project also sets the groundwork for identifying watersheds where a payment for watershed services (PWS) scheme may be an option for financing forest conservation and management on private unprotected forest lands. In perhaps its most important but most basic role, this work can serve as an education tool helping to illustrate the link between forests and provision of key watershed-based ecosystem services.

  15. T

    Resilience dataset for industrial and service development in countries along...

    • data.tpdc.ac.cn
    • tpdc.ac.cn
    zip
    Updated May 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinliang XU (2022). Resilience dataset for industrial and service development in countries along the Belt and Road (2000-2019) [Dataset]. http://doi.org/10.11888/HumanNat.tpdc.272269
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 19, 2022
    Dataset provided by
    TPDC
    Authors
    Xinliang XU
    Area covered
    Description

    "The resilience dataset reflects the level of resilience of industrial and service development in the countries along the Belt and Road, and the higher the value, the stronger the resilience of industrial and service development in the countries along the Belt and Road. The resilience of industrial and service sector development data products are prepared with reference to the World Bank's statistical database, using the year-on-year changes of two indicators, namely the value added of industry as a percentage of GDP and the value added of service sector as a percentage of GDP, for countries along the Belt and Road from 2000 to 2019, and on the basis of considering the year-on-year changes of each indicator. Based on the sensitivity and adaptability analysis, a comprehensive diagnostic was prepared to generate products on the resilience of industrial and service sector development. "The resilience dataset of industrial and service sector development in countries along the Belt and Road is an important reference for analysing and comparing the current resilience of industrial and service sector development in each country.

  16. c

    Natural Diversity Database

    • geodata.ct.gov
    Updated Feb 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy & Environmental Protection (2019). Natural Diversity Database [Dataset]. https://geodata.ct.gov/maps/CTDEEP::natural-diversity-database
    Explore at:
    Dataset updated
    Feb 21, 2019
    Dataset authored and provided by
    Department of Energy & Environmental Protection
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    See full Resource Data Guide here.Abstract: The Natural Diversity Database Areas is a 1:24,000-scale, polygon feature-based layer that represents general locations of endangered, threatened and special concern species. The layer is based on information collected by DEEP biologists, cooperating scientists, conservation groups and landowners. In some cases an occurrence represents a location derived from literature, museum records and specimens. These data are compiled and maintained by the DEEP Bureau of Natural Resources, Natural Diversity Database Program. The layer is updated every six months and reflects information that has been submitted and accepted up to that point. The layer includes state and federally listed species. It does not include Critical Habitats, Natural Area Preserves, designated wetland areas or wildlife concentration areas. These general locations were created by randomly shifting the true locations of terrestrial species and then adding a 0.25 mile buffer distance to each point, and by mapping linear segments with a 300 foot buffer associated with aquatic, riparian and coastal species. The exact location of the species observation falls somewhere within the polygon area and not necessarily in the center. Attribute information includes the date when these data were last updated. Species names are withheld to protect sensitive species from collection and disturbance. Data is compiled at 1:24,000 scale. These data are updated every six months, approximately in June and December. It is important to use the most current data available.Purpose: This dataset was developed to help state agencies and landowners comply with the State Endangered Species Act. Under the Act, state agencies are required to ensure that any activity authorized, funded or performed by the state does not threatened the continued existence of endangered or threatened species or their essential habitat. Applicants for certain state and local permits may be required to consult with the Department of Energy and Environmental Protections's Natural Diversity Data Base (NDDB) as part of the permit process. Follow instructions provided in the appropriate permit guidance. If you require a federal endangered species review, work with your federal regulatory agency and review the US Fish & Wildlife IPaC tool. Natural Diversity Data Base Areas are intended to be used as a pre-screening tool to identify potential impacts to known locations of state listed species. To use this data for site-based endangered species review, locate the project boundaries and any additionally affected areas on the map. If any part of the project is within a NDDB Area then the project may have a conflict with listed species. In the case of a potential conflict, an Environmental Review Request (https://portal.ct.gov/deep-nddbrequest) should be made to the Natural Diversity Data Base for further review. The DEEP will provide recommendations for avoiding impacts to state listed species. Additional onsite surveys may be requested of the applicant depending on the nature and scope of a project. For this reason, applicants should apply early in the planning stages of a project. Not all land use choices will impact the particular species that is present. Often minor modifications to the proposed plan can alleviate conflicts with state listed species.Other uses of the data include targeting areas for conservation or site management to enhance and protect rare species habitats.Supplemental information: For additional information, refer to the Department of Energy and Environmental Protection Endangered Species web page at https://portal.ct.gov/DEEP/Endangered-Species/Connecticuts-Endangered-Threatened-and-Special-Concern-Species

  17. D

    Clinical Data Management System (CDMS) Market Report | Global Forecast From...

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Clinical Data Management System (CDMS) Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/clinical-data-management-system-cdms-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Clinical Data Management System (CDMS) Market Outlook



    The global Clinical Data Management System (CDMS) market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach around USD 2.5 billion by 2032, expanding at a compound annual growth rate (CAGR) of approximately 8.0% during the forecast period. This growth is propelled by the increasing demand for efficient data management solutions in clinical trials, driven by the rising complexity of clinical research and regulatory requirements. The healthcare industry's shift towards digitalization and the growing adoption of cloud-based solutions also play crucial roles in enhancing the market dynamics. Furthermore, the rapid technological advancements in data management systems are expected to streamline clinical trial processes, further bolstering market growth.



    The escalation in the volume and complexity of clinical data is a significant growth factor for the CDMS market. As pharmaceutical and biotechnology companies strive to accelerate drug development processes, the need for sophisticated data management solutions that can handle large datasets while ensuring data integrity and regulatory compliance becomes imperative. Additionally, the increasing number of clinical trials, driven by the rising prevalence of chronic diseases and the need for innovative therapies, is creating a substantial demand for CDMS. These systems enable companies to manage data more efficiently and effectively, reducing errors and streamlining workflows, thus improving overall productivity and time-to-market for new drugs.



    Another critical factor contributing to the growth of the CDMS market is the increasing adoption of cloud-based solutions. Cloud-based CDMS platforms offer several advantages over traditional on-premises solutions, such as scalability, cost-effectiveness, and ease of access to data from remote locations. These benefits are particularly appealing to small and medium-sized enterprises (SMEs), which may not have the resources to invest in extensive IT infrastructure. The flexibility and scalability provided by cloud-based solutions also enable organizations to adapt to changing business needs and regulatory requirements more easily, making them an attractive option for many end-users. As a result, the demand for cloud-based CDMS is expected to witness significant growth throughout the forecast period.



    The growing emphasis on data security and privacy is also driving the CDMS market forward. With the increasing digitization of healthcare data, ensuring the security and privacy of sensitive information has become a top priority for organizations involved in clinical research. CDMS providers are continuously developing advanced security features to protect against data breaches and ensure compliance with regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). These efforts are not only essential for maintaining trust with stakeholders but also for preventing potential financial and reputational damage. As regulations become more stringent, the demand for CDMS with robust security features is expected to rise, contributing to the market's growth.



    Regionally, North America currently holds the largest share of the CDMS market, primarily due to the presence of a well-established healthcare infrastructure and a high number of clinical trials conducted in the region. The United States, in particular, is a major contributor to market growth, driven by the strong focus on research and development activities and the adoption of advanced technologies. However, the Asia Pacific region is expected to witness the highest growth rate over the forecast period, with a CAGR surpassing 9%. This growth is attributed to the increasing investments in healthcare infrastructure, the rising number of clinical trials, and supportive government initiatives. Furthermore, the growing presence of Contract Research Organizations (CROs) in countries like India and China is anticipated to drive the demand for CDMS in the region.



    Component Analysis



    The Clinical Data Management System (CDMS) market can be broadly segmented into software and services, each playing a critical role in the ecosystem of clinical trial data management. Software components of CDMS include various applications and platforms designed to streamline data collection, validation, and storage. These software solutions are integral to managing the vast amounts of data generated during clinical trials and ensuring compliance with stringent regulatory standards. Key functionalities of CDMS software include electronic data ca

  18. The Federal Big Data Research and Development Strategic Plan

    • datasets.ai
    • s.cnmilf.com
    • +2more
    33
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Networking and Information Technology Research and Development, Executive Office of the President (2024). The Federal Big Data Research and Development Strategic Plan [Dataset]. https://datasets.ai/datasets/the-federal-big-data-research-and-development-strategic-plan
    Explore at:
    33Available download formats
    Dataset updated
    Aug 9, 2024
    Authors
    Networking and Information Technology Research and Development, Executive Office of the President
    Description

    Summary: This Plan is an important milestone in the Administrations Big Data Research and Development (R&D) Initiative

  19. Dataset for: Navigating Ecosystem Services Trade-offs: A Global...

    • zenodo.org
    bin, pdf
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Jose Martinez-Harms; Maria Jose Martinez-Harms; Barbara Larrain Barrios; Barbara Larrain Barrios (2024). Dataset for: Navigating Ecosystem Services Trade-offs: A Global Comprehensive Review [Dataset]. http://doi.org/10.5281/zenodo.13249080
    Explore at:
    pdf, binAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Jose Martinez-Harms; Maria Jose Martinez-Harms; Barbara Larrain Barrios; Barbara Larrain Barrios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 2024
    Description

    Methods

    The dataset is the output of a comprehensive literature-based search that aims to collate all the evidence on where ES relationships have been mentioned and addressed. We applied systematic mapping which is based on the “Guidelines for Systematic Review in Environmental Management” developed by the Centre for Evidence-Based Conservation at Bangor University (Pullin and Stewart 2006).

    The methodological framework followed the standard stages outlined for systematic mapping in environmental sciences (James et al. 2016). Briefly, we defined the scope and objectives:

    · We comprehensively review and further explore the global evidence of ES trade-offs and synergies focusing on all systems including terrestrial, freshwater, and marine.

    · We compiled the evidence on trade-offs and synergies among multiple ES interacting across various ecosystems.

    · We performed a geographical and temporal trend analysis exploring the distribution of studies across the world examining how the focus on various ecosystem types and ES categories has evolved to highlight gaps and biases.

    Then we set the criteria for study inclusion (Table 1), searched the evidence, coded, and produced the database. Extracted article information including the specific criteria is detailed in Table 1.

    The first step was to search the ISI Web of Knowledge core collection (http://apps.webofknowledge.com) database, targeting the search on the ecosystem services literature and studies dealing with trade-offs/synergies, win-win outcomes or bundles when managing different ecosystem services in the landscape/seascape. All peer-reviewed journal articles written in English and Spanish have been considered for review.

    The peer-reviewed literature from 2005 to 2021 was reviewed identifying relevant studies according to specific search terms. The relevant search terms and descriptive words derived from (Howe et al. 2014) adding “bundles” and “co-benefits”. Boolean nomenclatures ‘*’ = all letters were allowed after the *, were used on the root of words where several different endings applied (Figure 1). Search terms used were:

    (“*ecosystem service*” OR “environment* service*” OR “ecosystem* approach*” OR “ecosystem good*” OR “environment* good*”)

    AND

    (“*trade-off*” OR “tradeoff*” OR “synerg*” OR “win-win*” OR “bundle*” OR “cost*and benefit*” OR “co-benefit*”) n=5194

    Papers were preliminarily coded with a semantic analysis using the R package Bibliometrix (http://www.bibliometrix.org).

    In the second step (Figure 1) papers were preliminarily coded with a semantic analysis using the R package Bibliometrix (http://www.bibliometrix.org). Papers were classified according to three systems: terrestrial, marine, and freshwater (Table 1). Papers with multiple systems, transitional habitats or those that could not be classified were classified as “other” (Mazor et al. 2018). Articles were classified based on the occurrence of the most frequent system words in their title, keywords, and abstract (Mazor et al. 2018). The set of system-specific words was determined by extracting the 250 most frequently used keywords from all considered articles and assigning each word to either system (articles could fall into just one of the four categories). Using this technique, we managed to classify 100% of the papers. To further enrich the dataset and make it a useful repository for science and policy, an additional sub-classification was performed, categorizing papers into the following categories: Coastal, Urban, Wetlands, Forest, Mountain, Freshwater, Agroecosystems, and Others that mainly represented multiple ecosystems (Table S1). This comprehensive classification approach enhances the dataset’s utility for various scientific and policy-making applications.

    In the third step (Figure 1), applying the same technique, we classified the papers into four ES categories: habitat (supporting biodiversity related), provisioning, regulating, and cultural services (De Groot et al. 2010; MEA 2005; Sukhdev 2010; Wallace 2007). For the classification into ES categories, articles could fall into one or more of the four categories (see Table 1 for example the keywords used to classify ecosystems, ES categories, and countries). Applying this technique, we excluded 2149 papers that weren’t classified in any of the ecosystem services types categories resulting in 3629 papers (see Figure 1).

    In the fourth step (Figure 1), an initial screening was conducted to identify papers that did not align with the review objectives of assessing ecosystem services trade-offs and synergies to inform policy and management decisions. We manually reviewed the titles of each paper in the dataset, excluding those that were from other fields or did not align with the review objectives. In this initial assessment, we excluded 347 papers, leaving a total of 3,286 papers for further review. A descriptive analysis of this 3286 article dataset was performed to examine the distribution of ES categories within each ecosystem type over the specified period. This analysis allowed us to conclude the prevalence of each ecosystem service category in different ecosystem types, identifying temporal trends and patterns. The number of occurrences was calculated for each ES category within each ecosystem type, expressed as counts. This allowed for the comparison of ecosystem service distributions across the selected ecosystem types.

    In the fifth step (Figure 1), we employed an approach to visually represent the geographical distribution and focus of ES studies across the world. With the classification of studies in ES categories and the types of ecosystems, the papers were coded according to the country where the study was performed. It was possible to assign a specific country to 2636 studies, removing 650 studies that did not specify the country of study. From these 2636 papers classified, a proportion were global studies that consider several countries under study (499 global studies).

    We developed global maps (Figure 1), each offering a unique perspective on the ES research landscape. The first map presents the total number of ES trade-off studies conducted worldwide, illustrating the geographical spread and concentration of research efforts to provide a clear overview of regions that have been extensively studied and those that may require more attention in future research. Additionally, we calculated two key metrics to assess research productivity more comprehensively: the number of research papers per capita and the number of research papers relative to Gross Domestic Product (GDP). For population and GDP, we used the most recent available data from the World Bank (https://data.worldbank.org). These alternative metrics normalize the data based on economic output and population size, providing a more balanced view of research activity across different countries (Figures S3).

    Detailed maps were created featuring pie charts that highlight the different categories of ES and ecosystem types addressed for each country. These charts offer an understanding of how various ES categories and ecosystems are represented in different parts of the world. Finally, we assessed ES trade-off studies to world regions (Africa, Antarctica, Asia, Australasia, Europe, Latin America, and North America) looking at the relationships between the categories of ES. We considered papers that evaluated more than one category of ES and the papers that considered only one category of ES. This country-level analysis offers insights into regional research trends and priorities, contributing to a more localized understanding of ES studies.

    In the sixth step (Figure 1), each publication in this review was critically appraised to evaluate the quality of the papers included in the review. The foundation for our critical appraisal stems from the comprehensive and multidimensional approach of Belcher et al. (2016) that is framed to evaluate research quality, which aligns well with the interdisciplinary nature of our study. Belcher et al. (2016) developed a robust framework that incorporates essential principles and criteria for assessing the quality of transdisciplinary research. This is particularly relevant for ecosystem services science and our review that contributes to advancing current knowledge by systematically synthesizing evidence on relationships among various ES across these diverse systems.

    The Belcher et al. (2016) framework emphasizes four main principles: relevance, credibility (which we have adapted as methodological transparency), legitimacy (generalizability in our context), and effectiveness (significance). A continuous scoring system (ranging from 0 to 1) was applied for the four main criteria to maintain simplicity and consistency across the large number of studies. In this system, a value closer to 0 indicates that the criteria are not met, while a value closer to 1 indicates that the criteria are more closely met. This scoring method was a useful indicator of the overall quality of the paper and how well the article met the review's goals overall.

    Methodological Transparency was assessed based on the clarity and completeness of methodological descriptions, including data availability, the rigor of statistical analyses, methodological detail, and reproducibility of the findings. This criterion assesses the transparency and rigor of the study's methodology, including data collection, analysis, and reporting (Belcher et al. 2016). Relevance was evaluated by the study's alignment with the review's objectives, its importance to the field, and its practical applicability. This includes the extent to which the study addresses pertinent research

  20. A

    ‘Coho Distribution [ds326]’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jul 12, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2007). ‘Coho Distribution [ds326]’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-coho-distribution-ds326-8ce8/latest
    Explore at:
    Dataset updated
    Jul 12, 2007
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Coho Distribution [ds326]’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/56847514-cf82-4bbe-809b-05499d165c9a on 26 January 2022.

    --- Dataset description provided by original source is as follows ---

    June 2016 VersionThis dataset represents the "Observed Distribution" for coho salmon in California by using observations made only between 1990 and the present. It was developed for the express purpose of assisting with species recovery planning efforts. The process for developing this dataset was to collect as many observations of the species as possible and derive the stream-based geographic distribution for the species based solely on these positive observations.For the purpose of this dataset an observation is defined as a report of a sighting or other evidence of the presence of the species at a given place and time. As such, observations are modeled by year observed as point locations in the GIS. All such observations were collected with information regarding who reported the observation, their agency/organization/affiliation, the date that they observed the species, who compiled the information, etc. This information is maintained in the developers file geodatabase (©Environmental Science Research Institute (ESRI) 2016).To develop this distribution dataset, the species observations were applied to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography. For each observation, a path was traced down the hydrography from the point of observation to the ocean, thereby deriving the shortest migration route from the point of observation to the sea. By appending all of these migration paths together, the "Observed Distribution" for the species is developed.It is important to note that this layer does not attempt to model the entire possible distribution of the species. Rather, it only represents the known distribution based on where the species has been observed and reported. While some observations indeed represent the upstream extent of the species (e.g., an observation made at a hard barrier), the majority of observations only indicate where the species was sampled for or otherwise observed. Because of this, this dataset likely underestimates the absolute geographic distribution of the species.It is also important to note that the species may not be found on an annual basis in all indicated reaches due to natural variations in run size, water conditions, and other environmental factors. As such, the information in this dataset should not be used to verify that the species are currently present in a given stream. Conversely, the absence of distribution linework for a given stream does not necessarily indicate that the species does not occur in that stream. The observation data were compiled from a variety of disparate sources including but not limited to CDFW, USFS, NMFS, timber companies, and the public. Forms of documentation include CDFW administrative reports, personal communications with biologists, observation reports, and literature reviews. The source of each feature (to the best available knowledge) is included in the data attributes for the observations in the geodatabase, but not for the resulting linework. The spatial data has been referenced to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography.Usage of this dataset:Examples of appropriate uses include:- species recovery planning- Evaluation of future survey sites for the species- Validating species distribution modelsExamples of inappropriate uses include:- Assuming absence of a line feature means that the species are not present in that stream.- Using this data to make parcel or ground level land use management decisions.- Using this dataset to prove or support non-existence of the species at any spatial scale.- Assuming that the line feature represents the maximum possible extent of species distribution.All users of this data should seek the assistance of qualified professionals such as surveyors, hydrologists, or fishery biologists as needed to ensure that such users possess complete, precise, and up to date information on species distribution and water body location.Any copy of this dataset is considered to be a snapshot of the species distribution at the time of release. It is impingent upon the user to ensure that they have the most recent version prior to making management or planning decisions.Please refer to "Use Constraints" section below.

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv

Data from: Data reuse and the open data citation advantage

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Oct 1, 2013
Dataset provided by
National Evolutionary Synthesis Center
Authors
Heather A. Piwowar; Todd J. Vision
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

Search
Clear search
Close search
Google apps
Main menu