Facebook
TwitterThe dataset used in the paper is the Universal Dependencies v1.2 dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental procedures for deep enzymology reactions with randomized substrates: For analysis of flanking sequence preferences of the TET enzymes, a similar approach as described for DNMTs (Emperle et al., 2019; Gao et al., 2020; Adam et al., 2020; Dukatz et al., 2020) was used. Briefly, the following single-stranded oligonucleotides containing a methylated or hydroxymethylated CpG or CpH site flanked by 10 randomized nucleotides on either side were obtained from IDT and primer extension was performed to obtain the double stranded DNA substrates. A CpN substrate was prepared as a mixture of CpG and CpH in a 1:3 ratio. For the randomized hydroxymethylated substrate, the single-stranded oligo was purchased coupled to Desthiobiotin-TEG. Primer extension was conducted and the substrate was purified via Streptavidin beads (Dynabeads M-280, ThermoFisher Scientific) and eluted with a biotin solution. HM rand. GAGTGTGACTAGGCTCTCACTGCCNNNNNNNNNN mC GNNNNNNNNNNGAGAGGAGACCTAGTGAGAAG OH rand. GAGTGTGACTAGGCTCTCACTGCCNNNNNNNNNN hmC GNNNNNNNNNNGAGAGGAGACCTAGTGAGAAG CH rand. GAGTGTGACTAGGCTCTCACTGCCNNNNNNNNNN mC HNNNNNNNNNNGAGAGGAGACCTAGTGAGAAG The randomized double stranded substrates were incubated with the TET enzyme at 37 °C for 45 min (CN context) or 1 h (CG context) using mixtures containing 1x reaction buffer (50 mM HEPES pH 6.8, 100 mM NaCl, 1 mM DTT, 1 mM alpha-ketoglutarate and 2 mM ascorbic acid), 100 µM ammonium iron(II) sulfate, using different enzyme concentrations and variable amounts of dialysis buffer to keep a fixed salt and glycerol concentration. Reactions were stopped by freezing in liquid nitrogen. Afterwards, Proteinase K (NEB) treatment was used for enzyme inactivation for 1 h at 50 °C, followed by purification with a PCR clean-up kit (MACHEREY-NAGEL). Hairpin ligation and bisulfite conversion was performed using EZ DNA Methylation-Lightning kit (ZYMO). Library preparation for Illumina Next Generation Sequencing was conducted using a two-step PCR approach as described in (Gao et al., 2020). Unique combinations of barcode and index sequences were introduced to distinguish different samples and experiments. For bioinformatic analysis of the NGS datasets, a local instance of a Galaxy server (Afgan et al., 2018) was used. Sequence reads were trimmed with Trim Galore! (Galaxy Version 0.4.3.1, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) keeping only the sequences with a quality score above 20 for further analysis, and filtered according to the expected DNA size using the Filter FASTQ tool (Blankenberg et al., 2010). The data in this entry contain the Fastq sequence files and extracted DNA sequences obtained with the hemimethylated CpG substrate (HM CG), hemimethylated CpN substrate mixture (HM CN) and hemihydroxymethylated CpG substrate (OH CG). Enzyme kinetics were conducted with TET1 and two versions of TET2 (V1 and V2) as described in the accompanying paper. Individual repeats of experiments are indicated with R1-R5 as appropriate. Control reaction refer to samples treated identically but without enzyme. The cited references are listed in the accompanying publication to this dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Feb 21: updated to v1.2. Please see https://github.com/y-zheng18/point_odyssey/tree/main for release notes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Readme file for ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health Generated on 2021-02-15Recommended citation for the dataset:P. Surana, M. Yusuf and S. Singh, "Severity Classification of Mental Health-Related Tweets," 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 2021, pp. 336-341, DOI: 10.1109/DISCOVER52564.2021.9663651.******************************PROJECT INFORMATION******************************1. Title of dataset: Mental Health Dataset2. Author information:Praatibh Surana, Manipal Institute of Technology,Mirza Yusuf, Manipal Institute of Technology,Sanjay Singh, Manipal Institute of TechnologyPrincipal Investigators Name: Praatibh SuranaAddress: Manipal Institute of TechnologyEmail: praatibhsurana@gmail.comName: Mirza YusufAddress: Manipal Institute of TechnologyEmail: baig.yusuf.cr7@gmail.comCo-InvestigatorName: Sanjay SinghAddress: Manipal Institute of TechnologyEmail: sanjay.singh@manipal.edu3. Date of data collection: Jan 2021 - Feb 2021************************************DATA ACCESS INFORMATION************************************1. Licences/restrictions placed on access to the dataset: CC BY 4.02. Links to publications that use the data:URL: https://ieeexplore.ieee.org/document/9663651,DOI: 10.1109/DISCOVER52564.2021.96636513. Links to a third party or secondary data used in the project (for example, existing datasets, third-party datasets)Pennington, Jeffrey et al. “GloVe: Global Vectors for Word Representation.” EMNLP (2014).DOI: https://doi.org/10.3115/v1/d14-1162*****************************************METHODS OF DATA COLLECTION*****************************************1. Describe the methods for data collection and/or provide links to papers describing data collection methodsPaper DOI :Our research revolved around correctly classifying tweets based on their severity in a mental health context. An effort was also made to make the models detect sarcasm better, as this was something that many models in the past failed to do. Our dataset consists of tweets labeled as ‘0’, ‘1’, and '2' depending on their classes. The labeling rules followed are given in Table 1TABLE 1 - SEVERITY CLASSIFICATION CLASSES AND EXAMPLES-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Class | Rules | Example | |0 | Helping / suggestion for mental health awareness | Are you suffering from anxiety? Check out this page for therapy through Skype! | / positivity / informative | | / motivational | | / questions about mental health | | |1 | Sarcasm/rant/expression of annoyance | Today was so annoying. If my teacher would have called my name, I swear to God I would have killed myself | |2 | Case of slight disturbance | All I am is a burden. I don’t want to live anymore. | / strong indication of disturbance | | / user outright mentions depression | | / anxiety / suicide / self-harm |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------The following steps were performed for data collection:1) Tweets were extracted with the help of Twitter’s official API using hashtags such as #depression, #mentalhealth, #anxiety, #selfharm, #killmyself, and #kms from users.2) Around 40,000 tweets were extracted from Twitter between January and February 2021, out of which the final dataset comprised of 2460 tweets; 820 tweets were distributed equally amongst the three classes.3) Two authors manually annotated the dataset and cross-verified it to ensure accurate labeling.2. Data processing methods:A. Preprocessing1) Removal of numbers, URLs, usernames, and special characters: The first step after extraction of the tweets was ensuring that they were suitable for further use. The “preprocessor” uses the Python library to eliminate numbers, retweets, URLs, emojis, emoticons, and usernames, followed by duplicate tweets removal from the dataset.2) Stopword removal and expansion of standard abbreviations: We made use of Python’s “nltk” library for the removal of common stopwords such as “for,” “the,” “a,” etc. As our data is sourced from Twitter, lots of common internet abbreviations like “lol,” “kms,” “gn,”etc., were used. It was taken care of by converting these short forms to their corresponding complete forms. Lots of short forms like “wanna” for “want to” and “gonna” for “going to” were used. We ensured that these, too, were taken care of to the best of our abilities. 3) Removal of names, so that anonymity is maintained. Names of people, places, twitter handles anything that could compromise the anonymity has been removed, a token named as ‘[redacted]’ has been used in their place instead.*******************************SUMMARY OF DATA FILE*******************************Filename: MentalHealthTweets.csvShort description: This CSV File contains 2460 tweets annotated ‘0’, ‘1’ or ‘2’ based on the class they belong to.*******************************************************************DATA-SPECIFIC INFORMATION FOR NOTE: This section should be copied and pasted for each file*******************************************************************1. Number of variables: 22. Number of cases rows: 24613. Missing data codes: NA4. Variable listThe variables and their properties have been provided in Table 2TABLE 2 - VARIABLE DESCRIPTION TABLE----------------------------------------------------------------------Variable Name | Variable Description | Variable Type | |tweets | Cleaned up tweet | String | |label | Annotation for tweet | Integer----------------------------------------------------------------------
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Table 2
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe dataset used in the paper is the Universal Dependencies v1.2 dataset.