5 datasets found

d
Grammar transformations of topographic feature type annotations of the U.S....
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Grammar transformations of topographic feature type annotations of the U.S. to structured graph data. [Dataset]. https://catalog.data.gov/dataset/grammar-transformations-of-topographic-feature-type-annotations-of-the-u-s-to-structured-g
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-speech, tag, dependency relation, dependency head, and lemma. Each word within the lexicon was tokenized. A stop-words list was utilized only to remove punctuation and symbols from the text, excluding hyphenated words (ex. bowl-shaped) which remained as such. The tokens’ lemmas were then aggregated and totaled to find their recurrences within the lexicon. This procedure was repeated for tokenizing noun chunks using the same glossary definitions.
w
Dataset of authors, books and publication dates of book series where books...
workwithdata.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of authors, books and publication dates of book series where books equals Adverbs : a graphic guide to grammar [Dataset]. https://www.workwithdata.com/datasets/book-series?col=book_series%2Cj0-author%2Cj0-book%2Cj0-publication_date&f=1&fcol0=j0-book&fop0=%3D&fval0=Adverbs+%3A+a+graphic+guide+to+grammar&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 1 row and is filtered where the books is Adverbs : a graphic guide to grammar. It features 4 columns: authors, books, and publication dates.
d
Replication Data for: \"The category of throw verbs as productive source of...
search.dataone.org
dataverse.no
+1more
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van Hulle, Sven; Enghels, Renata (2024). Replication Data for: \"The category of throw verbs as productive source of the Spanish inchoative construction.\" [Dataset]. http://doi.org/10.18710/TR2PWJ
Explore at:
Unique identifier
https://doi.org/10.18710/TR2PWJ
Dataset updated
Sep 25, 2024
Dataset provided by
DataverseNO
Authors
Van Hulle, Sven; Enghels, Renata
Time period covered
Jan 1, 1200 - Jan 1, 2000
Description
The dataset contains the quantitative data used to create the tables and graphics in the article "The category of throw verbs as productive source of the Spanish inchoative construction." The data from the 21th century originates from the Spanish Web Corpus (esTenTen18), accessed via Sketch Engine. Only the subcorpus for European Spanish Data was selected. After downloading, the samples were manually cleaned. In the dataset, maximally 500 tokens were retained per auxiliary. For the earlier centuries, the data was extracted from the Corpus Diacrónico del Español (Corde). See Spanish_ThrowVerbs_Inchoatives_queries_20230413.txt for the specific corpus queries that were used. The data were annotated for the infinitive observed after the preposition 'a' and for the semantic class to which this infinitive belongs, following the existing ADESSE classification (see below), besides other criteria that are not taken into account for this study. Concretely, the variables 'Century', 'INF' (infinitive) and 'Class' were used as input for the analysis (see data-specific sections below for more information about the variables). The empirical analysis is based on the downloaded data from the Spanish Web corpus (esTenTen18) (Kilgariff & Renau 2013). The Spanish Web corpus contains 20.3 billion words, from which 3.5 billion belong to the European Spanish domain. This corpus contains internet data, with observations originating from fora, blogs, Wikipedia, etc. Only the subcorpus with European Spanish data was consulted. The search syntax that was used to detect the inchoative construction was the following: “[lemma="echar"] [tag="R.*"]{0,3}"a"[tag="V.*"] within ” (consult Spanish_ThrowVerbs_Inchoatives_queries_20230413.txt for all corpus queries). After downloading, all the observations were manually cleaned. In total, the dataset contains, after the removal of false positives, 5514 tokens with a maximum of 500 tokens per auxiliary. False positive tokens were, for example, tagging errors wrongly coding nouns, such as Superman, Pokémon, Irán, among others, as infinitives, and also observations in which the auxiliary in combination with the infinitive did not express the inchoative value but its orginal semantic meaning, such as "saltar a nadar", for example, which means “to jump to swim” and not “to start to swim”. Of the auxiliaries with less than 500 relevant tokens in the esTenTen corpus, all tokens in the dataset were retained; for the auxiliaries with more than 500 tokens in the esTenTen corpus, only the first 500 were selected. For this specific study on the throw verbs, only the following auxilaries were retained: arrojar, disparar, echar, lanzar and tirar. For the diachronic data, the Corpus Diacrónico del Español (CORDE) was consulted. See Spanish_ThrowVerbs_Inchoatives_queries_20230413.txt for the specific queries that were used to retrieve the data in CORDE.
P
Machine Number Sense Dataset
paperswithcode.com
opendatalab.com
Updated Feb 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenhe Zhang; Chi Zhang; Yixin Zhu; Song-Chun Zhu (2021). Machine Number Sense Dataset [Dataset]. https://paperswithcode.com/dataset/machine-number-sense
Explore at:
Dataset updated
Feb 14, 2021
Authors
Wenhe Zhang; Chi Zhang; Yixin Zhu; Song-Chun Zhu
Description
Consists of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG). These visual arithmetic problems are in the form of geometric figures: each problem has a set of geometric shapes as its context and embedded number symbols.
Input utterance length (Alt et al., 2021)
asha.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mary Alt; Cecilia R. Figueroa; Heidi M. Mettler; Nora Evans-Reitz; Jessie A. Erikson (2023). Input utterance length (Alt et al., 2021) [Dataset]. http://doi.org/10.23641/asha.14226641.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.23641/asha.14226641.v1
Dataset updated
May 31, 2023
Dataset provided by
American Speech–Language–Hearing Associationhttp://www.asha.org/
Authors
Mary Alt; Cecilia R. Figueroa; Heidi M. Mettler; Nora Evans-Reitz; Jessie A. Erikson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose: This study examined the efficacy of the Vocabulary Acquisition and Usage for Late Talkers (VAULT) treatment in a version that manipulated the length of clinician utterance in which a target word was presented (dose length). The study also explored ways to characterize treatment responders versus nonresponders.Method: Nineteen primarily English-speaking late-talking toddlers (aged 24–34 months at treatment onset) received VAULT and were quasirandomly assigned to have target words presented in grammatical utterances matching one of two lengths: brief (four words or fewer) or extended (five words or more). Children were measured on their pre- and posttreatment production of (a) target and control words specific to treatment and (b) words not specific to treatment. Classification and Regression Tree (CART) analysis was used to classify responders versus nonresponders.Results: VAULT was successful as a whole (i.e., treatment effect sizes of greater than 0), with no difference between the brief and extended conditions. Despite the overall significant treatment effect, the treatment was not successful for all participants. CART results (using participants from the current study and a previous iteration of VAULT) provided a dual-node decision tree for classifying treatment responders versus nonresponders.Conclusions: The input-based VAULT treatment protocol is efficacious and offers some flexibility in terms of utterance length. When VAULT works, it works well. The CART decision tree uses pretreatment vocabulary levels and performance in the first two treatment sessions to provide clinicians with promising guidelines for who is likely to be a nonresponder and thus might need a modified treatment plan.Supplemental Material S1. Individual performance for participants identified as responders (effect size > 0) across baseline and treatment sessions for target and control words. The treatment condition for each participant (brief, i.e., 4 words or fewer; extended, i.e., 5 words or more) is indicated at the top of each graph.Supplemental Material S2. Individual performance for participants identified as nonresponders across baseline and treatment sessions for target and control words. The treatment condition for each participant (brief, i.e., 4 words or fewer; extended, i.e., 5 words or more) is indicated at the top of each graph.Alt, M., Figueroa, C. R., Mettler, H. M., Evans-Reitz, N., & Erikson, J. A. (2021). A vocabulary acquisition and usage for late talkers treatment efficacy study: The effect of input utterance length and identification of responder profiles. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2020_JSLHR-20-00525
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2024). Grammar transformations of topographic feature type annotations of the U.S. to structured graph data. [Dataset]. https://catalog.data.gov/dataset/grammar-transformations-of-topographic-feature-type-annotations-of-the-u-s-to-structured-g

Grammar transformations of topographic feature type annotations of the U.S. to structured graph data.

Explore at:

Dataset updated

Jul 20, 2024

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Area covered

United States

Description

These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-speech, tag, dependency relation, dependency head, and lemma. Each word within the lexicon was tokenized. A stop-words list was utilized only to remove punctuation and symbols from the text, excluding hyphenated words (ex. bowl-shaped) which remained as such. The tokens’ lemmas were then aggregated and totaled to find their recurrences within the lexicon. This procedure was repeated for tokenizing noun chunks using the same glossary definitions.

Clear search

Close search

Google apps

Main menu

Grammar transformations of topographic feature type annotations of the U.S....

Dataset of authors, books and publication dates of book series where books...

Replication Data for: \"The category of throw verbs as productive source of...

Machine Number Sense Dataset

Input utterance length (Alt et al., 2021)

Grammar transformations of topographic feature type annotations of the U.S. to structured graph data.See More Versions

Grammar transformations of topographic feature type annotations of the U.S. to structured graph data.