10 datasets found

W
CompArg: Comparative Sentences 2019
anthology.aicmu.ac.cn
webis.de
3237552
Updated 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Biemann; Matthias Hagen (2019). CompArg: Comparative Sentences 2019 [Dataset]. http://doi.org/10.5281/zenodo.3237552
Explore at:
3237552Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3237552
Dataset updated
2019
Dataset provided by
Friedrich Schiller University Jena
The Web Technology & Information Systems Network
Authors
Chris Biemann; Matthias Hagen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CompArg: Comparative Sentences 2019 dataset for comparative argument mining is composed of sentences annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object.
r
AC2-VMCOMG103 - Malekula comparative grammar (draft) by A. Capell
researchdata.edu.au
Updated Apr 7, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PARADISEC (2016). AC2-VMCOMG103 - Malekula comparative grammar (draft) by A. Capell [Dataset]. http://doi.org/10.4225/72/56EC1E44E0FA3
Explore at:
Unique identifier
https://doi.org/10.4225/72/56EC1E44E0FA3
Dataset updated
Apr 7, 2016
Dataset provided by
PARADISEC
Time period covered
Jan 1, 1970
Area covered

Description
19 page typed manuscript. -- Draft prepared by Capell. Part IV Malekula Comparative Grammar. Foreward and sound laws of Malekula.; Date of recording unknown.. Language as given: Malekula
Sample sentences from CompSent-19 dataset [9], with preference indications....
plos.figshare.com
figshare.com
xls
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marzieh Babaali; Afsaneh Fatemi; Mohammad Ali Nematbakhsh (2025). Sample sentences from CompSent-19 dataset [9], with preference indications. Note: Sequence matters—preferences reference the initial entity in comparison to the subsequent one. [Dataset]. http://doi.org/10.1371/journal.pone.0319824.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319824.t003
Dataset updated
May 27, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Marzieh Babaali; Afsaneh Fatemi; Mohammad Ali Nematbakhsh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample sentences from CompSent-19 dataset [9], with preference indications. Note: Sequence matters—preferences reference the initial entity in comparison to the subsequent one.
COVID-19 Open Research Dataset Sentence Clustering
kaggle.com
zip
Updated Apr 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajasankar Viswanathan (2020). COVID-19 Open Research Dataset Sentence Clustering [Dataset]. https://www.kaggle.com/rajasankar/covid19-open-research-dataset-sentence-clustering
Explore at:
zip(74817024 bytes)Available download formats
Dataset updated
Apr 6, 2020
Authors
Rajasankar Viswanathan
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

Finding useful information from 30,000 papers is a hard task. Understanding information from all those papers takes time. With advanced AI methods, we can find and extract similar patterns from text data. This method uses advanced AI to find patterns in an unsupervised way. This will be equal to comparing all the sentences with every other sentence in brute-force method.

How this is different from other AI methods

This method goes beyond sentence level co-occurrence pattern finding. As it compares each sentence with other sentences, similar or comparable patterns between the sentences are extracted rather than co-occurrence patterns by other methods.

As it compares the concepts and patterns not the words, hidden but related words or phrases can be found easily. In other words, it goes beyond keyword search to bring all the related sentences in one place. This also reduces the reading requirement.

Content

This dataset creates similar sentences from unsupervised learning methods thus it extracts all the similar sentences which are nearly similar. It has some noise data which may not useful because it is fully unsupervised method.

Data is cleaned, stopwords removed and only English language papers were considered. Final result is 4.5 million sentences. These were processed to find relevant clusters of sentences with desired similarity.

One example is given below.

For full text of the paper, please refer to https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge data.

title : Antimicrobial treatment guidelines for acute bacterial rhinosinusitis Executive Summary SINUS AND ALLERGY HEALTH PARTNERSHIP* paper id : 32d8d8a2e5e0a499c98a53c9f71a22469752247e line : Antibiotics can be placed into the following relative rank order of predicted clinical efficacy for adults: 90% to 92% ch respiratory fluoroquinolones (gatifloxacin, levofloxacin, moxifloxacin), ceftriaxone, high-dose amoxicillin/clavulanate (4 g/250 mg/day), and amoxicillin/clavulanate (1.75 g/250 mg/day); 83% to 88% ch high-dose amoxicillin (4 g/day), amoxicillin (1.5 g/day), cefpodoxime proxetil, cefixime (based on H influenzae and M catarrhalis coverage), cefuroxime axetil, cefdinir, and TMP/SMX; 77% to 81% ch doxycycline, clindamycin (based on gram-positive coverage only), azithromycin, clarithromycin and erythromycin, and telithromycin; 65% to 66% ch cefaclor and loracarbef.

title : Antimicrobial treatment guidelines for acute bacterial rhinosinusitis Executive Summary SINUS AND ALLERGY HEALTH PARTNERSHIP* paper id : 32d8d8a2e5e0a499c98a53c9f71a22469752247e line : Antibiotics can be placed into the following relative rank order of predicted clinical efficacy in children with ABRS: 91% to 92% ch ceftriaxone, high-dose amoxicillin/clavulanate (90 mg/6.4 mg per kg per day) and amoxicillin/clavulanate (45 mg/6.4 mg per kg per day); 82% to 87% ch highdose amoxicillin (90 mg/kg per day), amoxicillin (45 mg/kg per day), cefpodoxime proxetil, cefixime (based on H influenzae and M catarrhalis coverage only), cefuroxime axetil, cefdinir, and TMP/SMX; and 78% to 80% ch clindamycin (based on gram-positive coverage only), cefprozil, azithromycin, clarithromycin, and erythromycin; 67% to 68% ch cefaclor and loracarbef.

title : Antimicrobial treatment guidelines for acute bacterial rhinosinusitis Executive Summary SINUS AND ALLERGY HEALTH PARTNERSHIP* paper id : 32d8d8a2e5e0a499c98a53c9f71a22469752247e line : Recommendations for initial therapy for adult patients with mild disease (who have not received antibiotics in the previous 4 to 6 weeks) include the following choices: amoxicillin/clavulanate (1.75 to 4 g/250 mg per day), amoxicillin (1.5 to 4 g/day), cefpodoxime proxetil, cefuroxime axetil, or cefdinir.

title : Antimicrobial treatment guidelines for acute bacterial rhinosinusitis Executive Summary SINUS AND ALLERGY HEALTH PARTNERSHIP* paper id : 32d8d8a2e5e0a499c98a53c9f71a22469752247e line : Recommendations for initial therapy for children with mild disease and who have not received antibiotics in the previous 4 to 6 weeks include the following: high-dose amoxicillin/clavulanate (90 mg/6.4 mg per kg per day), amoxicillin (90 mg/kg per day), cefpodoxime proxetil, cefuroxime axetil, or cefdinir.

title : Antimicrobial treatment guidelines for acute bacterial rhinosinusitis Executive Summary SINUS AND ALLERGY HEALTH PARTNERSHIP* paper id : 32d8d8a2e5e0a499c98a53c9f71a22469752247e line : The relative antimicrobial activity against isolates of S pneumoniae based on PK/PD breakpoints, 89 can be listed as: gatifloxacin / levofloxacin / moxifloxacin ([?]99%); ceftriaxone / high-dose amoxicillin (Ti clavulanate [extended-release or extra strength]) (95% to 97%); amoxicillin (Ti clavulanate) / clindamycin (90% to 92%) ; cefpodoxime proxetil /cefuroxime axetil / cefdinir /erythromycin /cla...
r
AC2-VCGS303 - Comparative Grammar notes of some Vanuatu languages
researchdata.edu.au
Updated Mar 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PARADISEC (2016). AC2-VCGS303 - Comparative Grammar notes of some Vanuatu languages [Dataset]. http://doi.org/10.4225/72/56EC19FF46757
Explore at:
Unique identifier
https://doi.org/10.4225/72/56EC19FF46757
Dataset updated
Mar 18, 2016
Dataset provided by
PARADISEC
Time period covered
Jan 1, 1970
Area covered

Description
1 page manuscript. -- Draft by Capell. The Comparative Grammar notes. Languages are Efate, Pangkumu, Malo, Baki, Bieri, Tanna and Efate.; Date of recording unknown.. Language as given: Efate, Pangkumu (Rerep), Malo, Baki, Bieri (Bieria), Tanna, Futuna
f
Recidivism rates in individuals receiving community sentences: A systematic...
plos.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Denis Yukhnenko; Achim Wolf; Nigel Blackwood; Seena Fazel (2023). Recidivism rates in individuals receiving community sentences: A systematic review [Dataset]. http://doi.org/10.1371/journal.pone.0222495
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0222495
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Denis Yukhnenko; Achim Wolf; Nigel Blackwood; Seena Fazel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveWe aimed to systematically review recidivism rates in individuals given community sentences internationally. We sought to explore sources of variation between these rates and how reporting practices may limit their comparability across jurisdictions. Finally, we aimed to adapt previously published guidelines on recidivism reporting to include community sentenced populations.MethodsWe searched MEDLINE, PsycINFO, SAGE and Google Scholar for reports and studies of recidivism rates using non-specific and targeted searches for the 20 countries with the largest prison populations worldwide. We identified 28 studies with data from 19 countries. Of the 20 countries with the largest prison populations, only 2 reported recidivism rates for individuals given community sentences.ResultsThe most commonly reported recidivism information between countries was for 2-year reconviction, which ranged widely from 14% to 43% in men, and 9% to 35% in women. Explanations for recidivism rate variations between countries include when the follow-up period started and whether technical violations were taken into account.ConclusionRecidivism rates in individuals receiving community sentences are typically lower in comparison to those reported in released prisoners, although these two populations differ in terms of their baseline characteristics. Direct comparisons of the recidivism rates in community sentenced cohorts across jurisdictions are currently not possible, but simple changes to existing reporting practices can facilitate these. We propose recommendations to improve reporting practices.
f
Comparative preference types with examples.
plos.figshare.com
xls
Updated May 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marzieh Babaali; Afsaneh Fatemi; Mohammad Ali Nematbakhsh (2025). Comparative preference types with examples. [Dataset]. http://doi.org/10.1371/journal.pone.0319824.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319824.t005
Dataset updated
May 27, 2025
Dataset provided by
PLOS ONE
Authors
Marzieh Babaali; Afsaneh Fatemi; Mohammad Ali Nematbakhsh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The extraction of subjective comparative relations is essential in the field of question answering systems, playing a crucial role in accurately interpreting and addressing complex questions. To tackle this challenge, we propose the SCQRE model, specifically designed to extract subjective comparative relations from questions by focusing on entities, aspects, constraints, and preferences. Our approach leverages multi-task learning, the Natural Language Inference (NLI) paradigm, and a specialized adapter integrated into RoBERTa_base_go_emotions to enhance performance in Element Extraction (EE), Compared Elements Identification (CEI), and Comparative Preference Classification (CPC). Key innovations include handling X- and XOR-type preferences, capturing implicit comparative nuances, and the robust extraction of constraints often neglected in existing models. We also introduce the Smartphone-SCQRE dataset, along with another domain-specific dataset, Brands-CompSent-19-SCQRE, both structured as subjective comparative questions. Experimental results demonstrate that our model outperforms existing approaches across multiple question-level and sentence-level datasets and surpasses recent language models, such as GPT-3.5-turbo-0613, Llama-2-70b-chat, and Qwen-1.5-7B-Chat, showcasing its effectiveness in question-based comparative relation extraction.
Average length of prison sentences for offences in England and Wales 2023/24...
statista.com
Updated Sep 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average length of prison sentences for offences in England and Wales 2023/24 [Dataset]. https://www.statista.com/statistics/1100192/prison-sentence-length-in-england-and-wales-by-offence/
Explore at:
Dataset updated
Sep 12, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 1, 2023 - Mar 31, 2024
Area covered
Wales
Description
In 2023/24 the average custodial sentence length for sexual offences in England and Wales was 69.5 months, or just over five years, the most of any broad offence type in that year. Other crimes that carried high prison sentences were robbery offences at 45.2 months, and drug offences at 41.4 months. The average length of a prison sentence for all offences in 2024 was 22.5 months, while the offences that carried the shortest sentence lengths were motoring offences. Court backlog a major concern The number of crown court cases awaiting trial in England and Wales reached a high of over 67,573 cases in late 2023, almost double the number of outstanding cases in 2019. Although the number of new crown court cases has actually been declining, the courts have struggled to keep pace by closing existing cases, particularly during the COVID-19 pandemic. As a consequence of these pressures, the amount of time between a criminal offence taking place and the conclusion of the case has also risen. In 2014, it took an average of 412 days for an offence to reach a conclusion in the courts, with this rising to 697 days by 2021. The UK prison system The prison population of the United Kingdom was estimated to number approximately 97,800 people, as of 2024, the vast majority of which were in England and Wales. In 2023/24, the average cost of a prison place in England and Wales was estimated at 56,987 British pounds, compared with 51,724 pounds in the previous financial year. Of the various prisons across UK jurisdictions, the largest one in terms of capacity was HMS Oakwood in the West Midlands, which had a prison population of 2,121 in 2025. Despite the construction of relatively new prisons such as Oakwood, prison overcrowding has increased recently. In September 2023, for example, there were just 768 spare prison places in England and Wales compared with almost 2,600 in April 2022.
Grammar intervention in young children with DLD (Calder et al., 2020)
asha.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel D. Calder; Mary Claessen; Susan Ebbels; Suze Leitão (2023). Grammar intervention in young children with DLD (Calder et al., 2020) [Dataset]. http://doi.org/10.23641/asha.11958771.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.23641/asha.11958771.v1
Dataset updated
May 30, 2023
Dataset provided by
American Speech–Language–Hearing Associationhttps://www.asha.org/
Authors
Samuel D. Calder; Mary Claessen; Susan Ebbels; Suze Leitão
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose: This study evaluated the efficacy of an explicit, combined metalinguistic training and grammar facilitation intervention aimed at improving regular past tense marking for nine children aged 5;10–6;8 (years;months) with developmental language disorder.Method: This study used an ABA across-participant multiple-baseline single-case experimental design. Participants were seen one-on-one twice a week for 20- to 30-min sessions for 10 weeks and received explicit grammar intervention combining metalinguistic training using the SHAPE CODING system with grammar facilitation techniques (a systematic cueing hierarchy). In each session, 50 trials to produce the target form were completed, resulting in a total of 1,000 trials over 20 individual therapy sessions. Repeated measures of morphosyntax were collected using probes, including trained past tense verbs, untrained past tense verbs, third-person singular verbs as an extension probe, and possessive ’s as a control probe. Probing contexts included expressive morphosyntax and grammaticality judgment. Outcome measures also included pre–poststandard measures of expressive and receptive grammar.Results: Analyses of repeated measures demonstrated significant improvement in past tense production on trained verbs (eight of nine children) and untrained verbs (seven of nine children), indicating efficacy of the treatment. These gains were maintained for 5 weeks. The majority of children made significant improvement on standardized measures of expressive grammar (eight of nine children). Only five of nine children improved on grammaticality judgment or receptive measures.Conclusion: Results continue to support the efficacy of explicit grammar interventions to improve past tense marking in early school-aged children. Future research should aim to evaluate the efficacy of similar interventions with group comparison studies and determine whether explicit grammar interventions can improve other aspects of grammatical difficulty for early school-aged children with developmental language disorder.Supplemental Materials:S1. Expressive raw scores of participants on trained past tense verbs within-session.S2. Expressive raw scores of participants on trained past tense verbs between-session.S3. Expressive raw scores of participants on untrained past tense verbs.S4. Expressive scores of participants on third-person singular (extension).S5. Summary of Tau-U analyses for expressive repeated measures baseline versus treatment phase contrasts on untrained third-person singular targets (extension).S6. Graph of % correct on expressive third-person singular repeated measures (extension).S7. Expressive raw scores of participants on possessive ’s (control).S8. Summary of expressive repeated measures baseline versus treatment phase contrasts on untrained possessive ’s targets (control).S9. Graph of % correct on expressive possessive ’s repeated measures (control).S10. Grammaticality judgment raw scores of participants on trained past tense verbs within session.S11. Grammaticality judgment raw scores of participants on trained past tense verbs between-session.S12. Grammaticality judgment raw scores of participants on untrained past tense verbs.S13. Summary of grammaticality judgment repeated measures baseline versus treatment phase contrasts on trained and untrained targets.S14. Graph of % correct on grammaticality judgment within-session repeated measures.S15. Graph of % correct on grammaticality judgment between-session repeated measures.S16. Graph of % correct on expressive untrained repeated measures.S17. Grammaticality judgment raw scores of participants on third-person singular (extension).S18. Summary grammaticality judgment repeated measures baseline versus treatment phase contrasts on untrained third-person singular targets (extension).S19. Graph of % correct on grammaticality judgment third-person singular repeated measures (extension).S20. Grammaticality judgment raw scores of participants on possessive ’s (control).S21. Summary of grammaticality judgment repeated measures baseline versus treatment phase contrasts on untrained possessive ’s targets (control).S22. Graph of % correct on grammaticality judgment possessive ’s repeated measures (control).Calder, S. D., Claessen, M., Ebbels, S., & Leitão, S. (2020). Explicit grammar intervention in young school-aged children with developmental language disorder: An efficacy study using single-case experimental design. Language, Speech, and Hearing Services in Schools, 51(2), 298-316. https://doi.org/10.1044/2019_LSHSS-19-00060 Publisher Note: This article is part of the Forum: Morphosyntax Assessment and Intervention for Children.
The Korean speech recognition sentences (Song et al., 2023)
asha.figshare.com
xlsx
Updated Sep 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jieun Song; Byunjun Kim; Minjeong Kim; Paul Iverson (2023). The Korean speech recognition sentences (Song et al., 2023) [Dataset]. http://doi.org/10.23641/asha.24045582.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23641/asha.24045582.v1
Dataset updated
Sep 27, 2023
Dataset provided by
American Speech–Language–Hearing Associationhttps://www.asha.org/
Authors
Jieun Song; Byunjun Kim; Minjeong Kim; Paul Iverson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose: The aim of this study was to develop and validate a large Korean sentence set with varying degrees of semantic predictability that can be used for testing speech recognition and lexical processing.Method: Sentences differing in the degree of final-word predictability (predictable, neutral, and anomalous) were created with words selected to be suitable for both native and nonnative speakers of Korean. Semantic predictability was evaluated through a series of cloze tests in which native (n = 56) and nonnative (n = 19) speakers of Korean participated. This study also used a computer language model to evaluate final-word predictabilities; this is a novel approach that the current study adopted to reduce human effort in validating a large number of sentences, which produced results comparable to those of the cloze tests. In a speech recognition task, the sentences were presented to native (n = 23) and nonnative (n = 21) speakers of Korean in speech-shaped noise at two levels of noise.Results: The results of the speech-in-noise experiment demonstrated that the intelligibility of the sentences was similar to that of related English corpora. That is, intelligibility was significantly different depending on the semantic condition, and the sentences had the right degree of difficulty for assessing intelligibility differences depending on noise levels and language experience. Conclusions: This corpus (1,021 sentences in total) adds to the target languages available in speech research and will allow researchers to investigate a range of issues in speech perception in Korean.Supplemental Material S1. Full list of sentences.Song, J., Kim, B., Kim, M., & Iverson, P. (2023). The Korean speech recognition sentences: A large corpus for evaluating semantic context and language experience in speech perception. Journal of Speech, Language, and Hearing Research, 66(9), 3399–3412. https://doi.org/10.1044/2023_JSLHR-23-00137
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Chris Biemann; Matthias Hagen (2019). CompArg: Comparative Sentences 2019 [Dataset]. http://doi.org/10.5281/zenodo.3237552

CompArg: Comparative Sentences 2019

Explore at:

3237552Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.3237552

Dataset updated

2019

Dataset provided by

Friedrich Schiller University Jena
The Web Technology & Information Systems Network

Authors

Chris Biemann; Matthias Hagen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

CompArg: Comparative Sentences 2019 dataset for comparative argument mining is composed of sentences annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object.

Clear search

Close search

Google apps

Main menu

CompArg: Comparative Sentences 2019

AC2-VMCOMG103 - Malekula comparative grammar (draft) by A. Capell

Sample sentences from CompSent-19 dataset [9], with preference indications....

COVID-19 Open Research Dataset Sentence Clustering

Context

How this is different from other AI methods

Content

AC2-VCGS303 - Comparative Grammar notes of some Vanuatu languages

Recidivism rates in individuals receiving community sentences: A systematic...

Comparative preference types with examples.

Average length of prison sentences for offences in England and Wales 2023/24...

Grammar intervention in young children with DLD (Calder et al., 2020)

The Korean speech recognition sentences (Song et al., 2023)

CompArg: Comparative Sentences 2019