Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThese data were collected in the framework of Dr. Annemie Maertens’ PhD dissertation during the period August 2007 – July 2009. The dissertation was undertaken from Cornell University, but executed in India in collaboration with the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT). The project was sponsored through an NSF Doctoral Dissertation Improvement Grant (Grant No. 0649330).The main goal of this project was to collect and analyse household survey data in the Indian states of Andhra Pradesh and Maharashtra in order to gain a better understanding of the role of social networks and identity in economic decision-making. The first panel of this research studied the role of social learning and social pressures in Bacillus thurigiensis (Bt) cotton adoption using data from three villages (Aurepalle, Kanzara and Kinkhed). The second panel of this research studied the role of social norms in educational decisions and aspiration using data from three villages (Dokur, Kalman and Shirapur).The data collection consisted of five phases: (1) qualitative round (to determine the topic of the two panels); (2) trial round (to field test the questionnaires); (3) training round (to train the enumerators); (4) quantitative collection round (to collect the household level, village level data); (4’) data entry of (4) ; (5) data validation round (to collect additional data to correct the missing variables and inconsistencies uncovered in (4’)).The villages selected for this study are part of the Village Level Studies (VLS) program ICRISAT. In this program, ICRISAT followed 300 randomly selected households from six villages during the period 1975-1985 every three weeks. In 2001, ICRISAT restarted the panel, revisiting 185 of the first generation VLS households and their split-offs, in addition to 261 newly added households. This data collection is currently ongoing:To obtain the 1975-1985 and 2001-2006 data:http://www.icrisat.org/gt-mpi/knowledgeBase/Databases/vls.aspTo obtain the 2001-2006 data, see also:http://www.economics.ox.ac.uk/members/stefan.dercon/icrisat/ICRISAT/index.htmlZip files contain pdf / doc / dta filesStata (https://www.stata.com/) is required to view the .dta files - please refer to the read me.pdf before using the data collected.Published papers resulting from these dataMaertens, Annemie (2017) Who cares what others think (or do)? Social learning and social pressures in cotton farming in India. American Journal of Agricultural Economics, 99(4): 988-1007.Maertens, A., AV Chari and D.R. Just (2014). Why farmers sometimes love risks: evidence from India. Economic Development and Cultural Change, 62(2): 239-274.Maertens, Annemie and CB Barrett (2017). Measuring social networks' effects on agricultural technology adoption. American Journal of Agricultural Economics, 95(2): 353-359.Chari, A V and Maertens, Annemie (2014) Gender, productive ability and the perceived returns to education: evidence from rural India. Economics Letters, 122(2): 253-257.Maertens, Annemie (2013) Social norms and aspirations: age of marriage and education in rural India. World Development, 47: 1-15.Maertens, Annemie (2011) Does education pay off? Subjective expectations with regard to education in rural India. Economic and Political Weekly, 46 (9): 58-63.Maertens, Annemie and AV Chari (2020). What's your child worth? An analysis of expected dowry payments in rural India. World Development, 130.
The survey was sponsored by NBC News and the Wall Street Journal and conducted by the Hart-Teeter Research Companies. A National sample of 1,255 adults, including an oversample of 250 Blacks were interviewed on September 24-27, 1994. Major topics covered: GATT; political leaders; US involvement in Haiti; inner cities; race relations.
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at the Roper Center for Public Opinion Research at https://doi.org/10.25940/ROPER-31094750. We highly recommend using the Roper Center version as they may make this dataset available in multiple data formats in the future.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The collection of the Paleontological Research Institution (PRI) includes 7 million specimens, making it among the 10 largest invertebrate paleontology collections in the United States. Most of the collection consists of invertebrate fossils (representing almost every major group of organisms from around the world over the past 2 billion years), with major strengths in Cenozoic marine mollusks of the Western Hemisphere, Paleozoic marine invertebrates of New York State, and Cenozoic benthic foraminifera of the U.S. Coastal Plains and Caribbean. The collection also includes significant holdings of Recent mollusks. PRI houses all non-botanical fossils and Recent mollusks formerly held at Cornell University. PRI's collection of Type and Figured specimens (also one of the nation's 10 largest) includes more than 15,000 specimens, many of which were published in PRI’s journal, Bulletins of American Paleontology—one of the oldest peer-reviewed paleontological journals in the world.
The survey was sponsored by the NBC News and the Wall Street Journal and conducted by the Hart-Breglio Research Companies. A National sample of 1,502 adults were interviewed on May 15-19, 1992. Major topics covered: 1992 Presidential election; Ross Perot;l National economy; poverty; race relations.
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at the Roper Center for Public Opinion Research at https://doi.org/10.25940/ROPER-31094729. We highly recommend using the Roper Center version as they made this dataset available in multiple data formats.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.
In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.
Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.
The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!
ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.
The release of this dataset was featured further in a Kaggle blog post here.
https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">
See here for more information.
This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json
format. This file contains an entry for each paper, containing:
- id
: ArXiv ID (can be used to access the paper, see below)
- submitter
: Who submitted the paper
- authors
: Authors of the paper
- title
: Title of the paper
- comments
: Additional info, such as number of pages and figures
- journal-ref
: Information about the journal the paper was published in
- doi
: https://www.doi.org
- abstract
: The abstract of the paper
- categories
: Categories / tags in the ArXiv system
- versions
: A version history
You can access each paper directly on ArXiv using these links:
- https://arxiv.org/abs/{id}
: Page for this paper including its abstract and further links
- https://arxiv.org/pdf/{id}
: Direct link to download the PDF
The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset
or through Google API (json documentation and xml documentation).
You can use for example gsutil to download the data to your local machine. ```
gsutil cp gs://arxiv-dataset/arxiv/
gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/
gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```
We're automatically updating the metadata as well as the GCS bucket on a weekly basis.
Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.
The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.
We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data here originates from a manuscript that is going to be published in a peer-reviewed journal. This data will be linked with the manuscript upon its publication. If further details regarding the data are required, reach out to Evie Brahmstedt via esb279@cornell.edu.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was collected as part of the manuscript entitled 'Elevating the importance of Risk of Bias assessment for ecology and evolution'
Methods
1) Survey on the RoB awareness and use
The survey was approved by the Ethics Committee of the Ruđer Bošković Institute, Zagreb, Croatia, ref. ZV/3218/1-2023. The survey was intended for ecologists and evolutionary biologists who have published at least one meta-analysis. It included non-identifying questions on familiarity with the concept of the Risk of Bias, awareness and use of RoB assessment, and also included general questions on familiarity with meta-analysis, field of research, and career stage.
The survey was created in Google Forms and sent on the 11th September 2023 to the emails of corresponding authors of meta-analyses in ecology and evolution, via mailing lists (NC3 Collaborative Research Centre; Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology; German Zoological Society; Sociedad Española de Etología y Ecología Evolutiv;, EuropeList@conbio.org), slack channels (Big Team Science Conference Slack Channel, German Reproducibility Network Event Slack Channel, ESMARConf Slack Channel), Twitter posts, and the SORTEE newsletter. The survey was open to responses until the 15th October 2023.
To determine the corresponding authors of meta-analyses, AST (author) searched for meta-analyses published in 300 journals via Web of Science (databases: SCI-EXPANDED, SSCI, AHCI, ESCI) on the 25th April 2023 (search string, which also provides codes for included journals, can be found in 'search_string.txt' ). This search retrieved 3,289 results (potential meta-analyses, BibTeX list of these available in this data package) published between 1945 and 2023. AST then extracted a list of all corresponding author email addresses from these articles using packages revtools 0.4.1 (Westgate 2019a, 2019 b) and stringr 1.5.0 (Wickham 2023) in R 4.2.3 (R Core Team, 2021). Code is available as '001_email_compilation.R'. This resulted in 3,346 email addresses; however, 789 emails (~24%) bounced back.
The survey received 232 valid responses (i.e., 9.1% response rate). Because some of the responses were prone to subjective judgment on what the answer exactly meant, four assessors (AC, AST, ROD, and MG; authors) went through the answers of 188 respondents who had answered ‘YES’ to the question ‘Prior to receiving this survey invite, were you familiar with the concept of Risk of Bias (RoB)?’ or who had answered ‘NO’ but their remaining answers indicated otherwise. Each assessor provided an answer ‘YES’, ‘NO’, ‘Unsure’, or ‘NA’ to the following six questions:
Has the respondent heard of RoB?
Does the respondent have a correct interpretation of RoB?
Does the respondent claim to have conducted a RoB assessment?
Have they truly conducted RoB assessment?
Respondent thinks RoB is publication bias
Respondent has conducted publication bias, rather than RoB assessment
We then compared the interpretations of the responses among all four assessors. When three or four assessors had the same interpretation, we chose the most common answer as the final one. When there was disagreement, we discussed the interpretations and agreed on whether the final answer should be ‘YES’, ‘NO’, or ‘NA’ (i.e., either the question is irrelevant given the previous answers, we could not agree on the interpretation, or agreed that the answer was too vague to interpret). For the analyses, we used this final data table containing both the original responses and each evaluator's scores ('Survey_scores_per_evaluator.csv') and the post hoc agreed scores ('Survey_final_scores.csv').
2) Journals and RoB assessment
Between 19th April 2025 and 12th May 2025, AC, OP, and ROD (authors) checked the websites of 275 journals that publish ecology and evolutionary biology research. The list of journals was taken from Ivimey-Cook et al. (2025). We checked each journal’s Aims & Scope, Author instructions, and Editorial policy sections to search for whether a journal accepts evidence synthesis, and whether it mentions Risk of Bias or any related concepts for authors of evidence synthesis articles.
We first piloted our data extraction on 10 journals to adjust the data extraction questions and align responses. We used Google Forms for data extraction. The main questions included:
Does the journal explicitly solicit some form of evidence synthesis?
Does the journal specifically mention RoB or related assessment of primary literature in guidelines to authors of evidence synthesis, or is RoB/related assessment specifically mentioned in linked other guidelines (e.g., journal states something like ‘follow PRISMA guidelines when reporting MA’ but no further detail on RoB or related is mentioned)?
What concept related to RoB (including RoB itself) does the journal or a linked guideline exactly mention?
If the journal links or refers to external guidelines that mention RoB or related assessment, what are these guidelines?
What specific RoB or related tool/checklist is mentioned in journal guidelines, or in linked guidelines?
What is the strength of the journal’s policy on the use of RoB or related assessment?
The following decisions were made in light of the pilot extraction: First, we followed this definition of evidence synthesis ‘‘Evidence syntheses are conducted in an unbiased, reproducible way to provide evidence for practice and policy-making, as well as to identify gaps in the research. Evidence syntheses may also include a meta-analysis, a more quantitative process of synthesising and visualising data retrieved from various studies’ (https://guides.library.cornell.edu/evidence-synthesis). Second, journals that explicitly solicit narrative reviews and similar (e.g. the Annual Review of Ecology, Evolution and Systematics solicits ‘essay reviews’ but not systematic or quantitative reviews) were scored as ‘NO’ for question (1) above, whereas journals that do not explicitly solicit evidence synthesis, but something more general (e.g. review articles, reviews and comprehensive synthesis, reviews) were scored as ‘Unsure’ for the same question.
We divided journals across reviewers, and one reviewer checked each journal. The data table containing reviewer initials and their scores is 'Journals_risk_of_bias'.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PainT study participant data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThese data were collected in the framework of Dr. Annemie Maertens’ PhD dissertation during the period August 2007 – July 2009. The dissertation was undertaken from Cornell University, but executed in India in collaboration with the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT). The project was sponsored through an NSF Doctoral Dissertation Improvement Grant (Grant No. 0649330).The main goal of this project was to collect and analyse household survey data in the Indian states of Andhra Pradesh and Maharashtra in order to gain a better understanding of the role of social networks and identity in economic decision-making. The first panel of this research studied the role of social learning and social pressures in Bacillus thurigiensis (Bt) cotton adoption using data from three villages (Aurepalle, Kanzara and Kinkhed). The second panel of this research studied the role of social norms in educational decisions and aspiration using data from three villages (Dokur, Kalman and Shirapur).The data collection consisted of five phases: (1) qualitative round (to determine the topic of the two panels); (2) trial round (to field test the questionnaires); (3) training round (to train the enumerators); (4) quantitative collection round (to collect the household level, village level data); (4’) data entry of (4) ; (5) data validation round (to collect additional data to correct the missing variables and inconsistencies uncovered in (4’)).The villages selected for this study are part of the Village Level Studies (VLS) program ICRISAT. In this program, ICRISAT followed 300 randomly selected households from six villages during the period 1975-1985 every three weeks. In 2001, ICRISAT restarted the panel, revisiting 185 of the first generation VLS households and their split-offs, in addition to 261 newly added households. This data collection is currently ongoing:To obtain the 1975-1985 and 2001-2006 data:http://www.icrisat.org/gt-mpi/knowledgeBase/Databases/vls.aspTo obtain the 2001-2006 data, see also:http://www.economics.ox.ac.uk/members/stefan.dercon/icrisat/ICRISAT/index.htmlZip files contain pdf / doc / dta filesStata (https://www.stata.com/) is required to view the .dta files - please refer to the read me.pdf before using the data collected.Published papers resulting from these dataMaertens, Annemie (2017) Who cares what others think (or do)? Social learning and social pressures in cotton farming in India. American Journal of Agricultural Economics, 99(4): 988-1007.Maertens, A., AV Chari and D.R. Just (2014). Why farmers sometimes love risks: evidence from India. Economic Development and Cultural Change, 62(2): 239-274.Maertens, Annemie and CB Barrett (2017). Measuring social networks' effects on agricultural technology adoption. American Journal of Agricultural Economics, 95(2): 353-359.Chari, A V and Maertens, Annemie (2014) Gender, productive ability and the perceived returns to education: evidence from rural India. Economics Letters, 122(2): 253-257.Maertens, Annemie (2013) Social norms and aspirations: age of marriage and education in rural India. World Development, 47: 1-15.Maertens, Annemie (2011) Does education pay off? Subjective expectations with regard to education in rural India. Economic and Political Weekly, 46 (9): 58-63.Maertens, Annemie and AV Chari (2020). What's your child worth? An analysis of expected dowry payments in rural India. World Development, 130.