Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data has been collected through a standard scraping process from the Medium website, looking for published articles.
Each row in the data is a different article published on Medium. For each article, you have the following features: - title [string]: The title of the article. - text [string]: The text content of the article. - url [string]: The URL associated to the article. - authors [list of strings]: The article authors. - timestamp [string]: The publication datetime of the article. - tags [list of strings]: List of tags associated to the article.
You can find a very quick data analysis in this notebook.
Scraping has been done with Python and the requests library. Starting from a random article on Medium, the next articles to scrape are selected by visiting: 1. The author archive pages. 2. The publication archive pages (if present). 3. The tags archives (if present).
The article HTML pages have been parsed with the newspaper Python library.
Published articles have been filtered for English articles only, using the Python langdetect library.
As a consequence of the collection methodology, the scraped articles are coming from a not uniform publication date distribution. This means that there are articles published in 2016 and in 2022, but the number of articles in this dataset published in 2016 is not the same as the number of articles published in 2022. In particular, there is a strong prevalence of articles published in 2020. Have a look at the accompanying notebook to see the distribution of the publication dates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
Facebook
TwitterDataset for the statistical analysis of the article "Empowerment through Participatory Game Creation: A Case Study with Adults with Intellectual Disability".
Facebook
TwitterShowing statistical analysis of study data.
Facebook
TwitterStatistical analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note. 5-yr IF = five-year Impact Factor in 2011. Articles = number of articles published in 2012. Empirical = number of empirical articles published in 2012.Sample.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThere is widespread evidence that statistical methods play an important role in original research articles, especially in medical research. The evaluation of statistical methods and reporting in journals suffers from a lack of standardized methods for assessing the use of statistics. The objective of this study was to develop and evaluate an instrument to assess the statistical intensity in research articles in a standardized way.MethodsA checklist-type measure scale was developed by selecting and refining items from previous reports about the statistical contents of medical journal articles and from published guidelines for statistical reporting. A total of 840 original medical research articles that were published between 2007–2015 in 16 journals were evaluated to test the scoring instrument. The total sum of all items was used to assess the intensity between sub-fields and journals. Inter-rater agreement was examined using a random sample of 40 articles. Four raters read and evaluated the selected articles using the developed instrument.ResultsThe scale consisted of 66 items. The total summary score adequately discriminated between research articles according to their study design characteristics. The new instrument could also discriminate between journals according to their statistical intensity. The inter-observer agreement measured by the ICC was 0.88 between all four raters. Individual item analysis showed very high agreement between the rater pairs, the percentage agreement ranged from 91.7% to 95.2%.ConclusionsA reliable and applicable instrument for evaluating the statistical intensity in research papers was developed. It is a helpful tool for comparing the statistical intensity between sub-fields and journals. The novel instrument may be applied in manuscript peer review to identify papers in need of additional statistical review.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Medium is an American online publishing platform launched in August 2012. Crawl Feeds team extracted data from medium articles for research and analysis purposes.
Fields
Total fields: 15
url, crawled_at, id, title, author, published_at, author_url, reading_time, total_claps, raw_description, source, description, tags, images, modified_at
Get complete dataset from crawl feeds over more than 500K+ records Link
Facebook
TwitterDetailed methods, statistical analysis, figures, and references.
Facebook
TwitterThis file contains R code for the data analyzed in the paper from Frontiers in Ecology and Evolution
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Facebook
TwitterAlthough prehospital emergency anesthesia (PHEA), with a specific focus on intubation attempts, is frequently studied in prehospital emergency care, there is a gap in the knowledge on aspects related to adherence to PHEA guidelines. This study investigates adherence to the “Guidelines for Prehospital Emergency Anesthesia in Adults” with regard to the induction of PHEA, including the decision making, rapid sequence induction, preoxygenation, standard monitoring, intubation attempts, adverse events, and administration of appropriate medications and their side effects. This retrospective study examined PHEA interventions from 01/01/2020 to 12/31/2021 in the city of Aachen, Germany. The inclusion criteria were adult patients who met the indication criteria for the PHEA. Data were obtained from emergency medical protocols. A total of 127 patients were included in this study. All the patients met the PHEA indication criteria. Despite having a valid indication, 29 patients did not receive the PHEA. 98 patients were endotracheally intubated. For these patients, monitoring had conformed to the guidelines. The medications were used according to the guidelines. A significant increase in oxygen saturation was reported after anesthesia induction (p < 0.001). The patients were successfully intubated endotracheally on the third attempt. Guideline adherence was maintained in terms of execution of PHEA, rapid sequence induction, preoxygenation, monitoring, selection, and administration of relevant medications. Emergency physicians demonstrated the capacity to effectively respond to cardiorespiratory events. Further investigations are needed on the group of patients who did not receive PHEA despite meeting the criteria. The underlying causes of decision making in these cases need to be evaluated in the future.
Facebook
TwitterBackgroundEhrlichia canis, a rickettsial organism, is responsible for causing ehrlichiosis, a tick-borne disease affecting dogs.ObjectivesThis study aimed to estimate ehrlichiosis prevalence and identify associated risk factors in pet dogs.MethodsA total of 246 peripheral blood samples were purposively collected from pet dogs in Dhaka, Mymensingh, and Rajshahi districts between December 2018 and December 2020. Risk factor data were obtained through face-to-face interviews with dog owners using a pre-structured questionnaire. Multivariable logistic regression analysis identified risk factors. Polymerase chain reaction targeting the 16S rRNA gene confirmed Ehrlichia spp. PCR results were further validated by sequencing.ResultsThe prevalence and case fatality of ehrlichiosis were 6.9% and 47.1%, respectively. Dogs in rural areas had 5.8 times higher odds of ehrlichiosis (odd ratio, OR: 5.84; 95% CI: 1.72–19.89) compared to urban areas. Dogs with access to other dogs had 5.14 times higher odds of ehrlichiosis (OR: 5.14; 95% CI: 1.63–16.27) than those without such access. Similarly, irregularly treated dogs with ectoparasitic drugs had 4.01 times higher odds of ehrlichiosis (OR: 4.01; 95% CI: 1.17–14.14) compared to regularly treated dogs. The presence of ticks on dogs increased ehrlichiosis odds nearly by 3 times (OR: 3.02; 95% CI: 1.02–8.97). Phylogenetic analysis, based on 17 commercially sequenced isolates, showed different clusters of aggregation, however, BAUMAH-13 (PP321265) perfectly settled with a China isolate (OK667945), similarly, BAUMAH-05 (PP321257) with Greece isolate (MN922610), BAUMAH-16 (PP321268) with Italian isolate (KX180945), and BAUMAH-07 (PP321259) with Thailand isolate (OP164610).ConclusionsPet owners and veterinarians in rural areas should be vigilant in monitoring dogs for ticks and ensuring proper preventive care. Limiting access to other dogs in high-risk areas can help mitigate disease spread. Tick prevention measures and regular treatment with ectoparasitic drugs will reduce the risk of ehrlichiosis in dogs. The observed genetic similarity of the Bangladeshi Ehrlichia canis strain highlights the need for ongoing surveillance and research to develop effective control and prevention strategies, both within Bangladesh and globally.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
CNBC Economy Articles Dataset is an invaluable collection of data extracted from CNBC’s economy section, offering deep insights into global and U.S. economic trends, market dynamics, financial policies, and industry developments.
This dataset encompasses a diverse array of economic articles on critical topics like GDP growth, inflation rates, employment statistics, central bank policies, and major global events influencing the market. Designed for researchers, analysts, and businesses, it serves as an essential resource for understanding economic patterns, conducting sentiment analysis, and developing financial forecasting models.
Each record in the dataset is meticulously structured and includes:
This rich combination of fields ensures seamless integration into data science projects, research papers, and market analyses.
Interested in additional structured news datasets for your research or analytics needs? Check out our news dataset collection to find datasets tailored for diverse analytical applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionA required step for presenting results of clinical studies is the declaration of participants demographic and baseline characteristics as claimed by the FDAAA 801. The common workflow to accomplish this task is to export the clinical data from the used electronic data capture system and import it into statistical software like SAS software or IBM SPSS. This software requires trained users, who have to implement the analysis individually for each item. These expenditures may become an obstacle for small studies. Objective of this work is to design, implement and evaluate an open source application, called ODM Data Analysis, for the semi-automatic analysis of clinical study data.MethodsThe system requires clinical data in the CDISC Operational Data Model format. After uploading the file, its syntax and data type conformity of the collected data is validated. The completeness of the study data is determined and basic statistics, including illustrative charts for each item, are generated. Datasets from four clinical studies have been used to evaluate the application’s performance and functionality.ResultsThe system is implemented as an open source web application (available at https://odmanalysis.uni-muenster.de) and also provided as Docker image which enables an easy distribution and installation on local systems. Study data is only stored in the application as long as the calculations are performed which is compliant with data protection endeavors. Analysis times are below half an hour, even for larger studies with over 6000 subjects.DiscussionMedical experts have ensured the usefulness of this application to grant an overview of their collected study data for monitoring purposes and to generate descriptive statistics without further user interaction. The semi-automatic analysis has its limitations and cannot replace the complex analysis of statisticians, but it can be used as a starting point for their examination and reporting.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article contains consolidated proteomic data obtained from xylem sap collected from tomato plants grown in Fe- and Mn-sufficient control, as well as Fe-deficient and Mn-deficient conditions. Data presented here cover proteins identified and quantified by shotgun proteomics and Progenesis LC-MS analyses: proteins identified with at least two peptides and showing changes statistically significant (ANOVA; p ≤ 0.05) and above a biologically relevant selected threshold (fold ≥ 2) between treatments are listed. The comparison between Fe-deficient, Mn-deficient and control xylem sap samples using a multivariate statistical data analysis (Principal Component Analysis, PCA) is also included. Data included in this article are discussed in depth in "Effects of Fe and Mn deficiencies on the protein profiles of tomato (Solanum lycopersicum) xylem sap as revealed by shotgun analyses", Ceballos-Laita et al., J. Proteomics, 2018. This dataset is made available to support the cited study as well to extend analyses at a later stage. Resources in this dataset:Resource Title: ProteomeExchange submission PXD007517. Xylem sap shotgun proteomics from Fe- and Mn-deficient and Mn-toxic tomato plants. . File Name: Web Page, url: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD007517 The MS proteomics data have been deposited to the ProteomeXchange Consortium via the Pride partner repository with the data set identifier PXD007517. Also includes FTP location. Files available at https://www.ebi.ac.uk/pride/archive/projects/PXD007517 via HTML, FTP, or Fast (Aspera) download : 1 SEARCH.xml file, 1 Peak file, 24 RAW files, 1 Mascot information.xlsx file. Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2018.01.034
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).