Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset accompanies a paper to be published in "Morphology" (JOMO, Springer). Under the present DOI, all data generated for this research as well as all scripts used are stored. The paper itself is not CC-licensed, refer to Springer's "Morphology" website for details!AbstractIn this paper, we take a closer theoretical and empirical look at the linking elements in German N1+N2 compounds which are identical to the plural marker of N1 (such as -er with umlaut, as in Häus-er-meer 'sea of houses'). Various perspectives on the actual extent of plural interpretability of these pluralic linking elements are expressed in the literature. We aim to clarify this question by empirically examining to what extent there may be a relationship between plural form and meaning which informs in which sorts of compounds pluralic linking elements appear. Specifically, we investigate whether pluralic linking elements occur especially frequently in compounds where a plural meaning of the first constituent is induced either externally (through plural inflection of the entire compound) or internally (through a relation between the constituents such that N2 forces N1 to be conceptually plural, as in the example above). The results of a corpus study using the DECOW16A corpus and a split-100 experiment show that in the internal but not external plural meaning conditions, a pluralic linking element is preferred over a non-pluralic one, though there is considerable inter-speaker variability, and limitations imposed by other constraints on linking element distribution also play a role. However, we show the overall tendency that German language users do use pluralic linking elements as cues to the plural interpretation of N1+N2 compounds. Our interpretation does not reference a specific morphological framework. Instead, we view our data as strengthening the general approach of probabilistic morphology.
Facebook
TwitterThis dataset provides ample information on over 8,000 various English words, including nouns and their plural forms. By mining this data, researchers can gain valuable insights into understanding the English language in a more efficient way
This dataset can be used to help researchers understand the English language in a new and innovative way. The data includes information on over 8,000 different English words, including nouns and their plural forms. This dataset is particularly useful for investigating the relationships between words and their plural forms
See the dataset description for more information.
File: adjectives.csv
File: adverbs.csv
File: nouns.csv | Column name | Description | |:--------------|:-----------------------------------------------------------| | 007 | The code name of the character. (String) | | 007s | The number of times the character has been used. (Integer) |
File: plural-nouns.csv
File: verbs.csv | Column name | Description | |:--------------|:-----------------------------| | awake | (adjective) to stop sleeping | | awoke | (verb) to stop sleeping | | awoken | (verb) to stop sleeping |
File: words-multiple-present-participle.csv | Column name | Description | |:-----------------------------------|:-------------------------------------------------------------| | Word | The word being described. (String) | | Present Participle | The present participle form of the word. (String) | | Present Participle Alternative | An alternative present participle form of the word. (String) |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file provides the singular powers and collocation points described in the paper "On the Approximation of Singular Functions by Series of Non-integer Powers," available on arXiv, for several useful combinations of the parameters a, b, and the precision ε. It also includes a MATLAB script which demonstrates the effectiveness of these singular powers and collocation points for approximating singular functions of the form x^c, where c is in the interval [a,b].
Facebook
TwitterDataset description: The General Regionally Annotated Corpus of Ukrainian (GRAC, Shvedova et al. 2017-2024, uacorpus.org) was consulted to collect data for further analysis concerning the distribution of Singular vs. Plural verb forms in the target bahato construction. GRAC is a Sketch Engine corpus of over 1.8 billion words, representing texts from over 30,000 authors created between 1816 and 2023. This corpus is designed to serve as source material for linguistic research on Standard Ukrainian. Our data was collected during the month of February 2024. We extracted and annotated 28,491 examples of the bahato construction. An additional set of examples was collected from the Russian National Corpus (ruscorpora.ru) during the month of August 2024 to provide comparison with the Russian mnogo construction. For this purpose, 6,612 examples were extracted and annotated for word order and Singular vs. Plural verb agreement. Both the Ukrainian and the Russian data are included in this dataset, along with the R scripts used to analyze this data. Article abstract: We reveal an ongoing language change in Ukrainian involving a construction with a subject comprised of the indefinite quantifier багато ‘many’ modifying a noun phrase in the Genitive Plural. Number agreement on the verb varies, allowing both Singular (in 69.1% of attestations) and Plural (in 30.9% of attestations). Based on statistical analysis of corpus data, we investigate the influence of the factors of year of creation, word order of subject and verb, and animacy of the subject on the choice of verb number. We find that, while all combinations of word order and animacy are robustly attested, VS word order and inanimate subjects tend to prefer Singular, whereas SV word order and animate subjects tend to prefer Plural. Since about the 1950s, the proportion of Plural has been increasing, overtaking Singular in the current decade. We propose that this Singular vs. Plural variation is motivated by the human embodied experience of construing a group of items as either a homogeneous mass (and therefore Singular) or a multiplicity of individuals (and therefore Plural). This proposal is supported by the identification of micro-constructions that prefer Singular and show reduced individuation of human beings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dozens of missing epochs in the monthly gravity product of the satellite mission Gravity Recovery and Climate Experiment (GRACE) and its follow-on (GRACE-FO) mission greatly inhibit the complete analysis and full utilization of the data. Despite previous attempts to handle this problem, a general all-purpose gap-filling solution is still lacking. Here we propose a non-parametric, data-adaptive and easy-to-implement approach - composed of the Singular Spectrum Analysis (SSA) gap-filling technique, cross-validation, and spectral testing for significant components - to produce reasonable gap-filling results in the form of spherical harmonic coefficients (SHCs). We demonstrate that this approach is adept at inferring missing data from long-term and oscillatory changes extracted from available observations. A comparison in the spectral domain reveals that the gap-filling result resembles the product of GRACE missions below spherical harmonic degree 30 very well. As the degree increases above 30, the amplitude per degree of the gap-filling result decreases more rapidly than that of GRACE/GRACE-FO SHCs, showing effective suppression of noise. As a result, our approach can reduce noise in the oceans without sacrificing resolutions on land. The gap filling dataset is stored in the “SSA_filing/" folder. Each file represents a monthly result in the form of spherical harmonics. The data format follows the convention of the site ftp://isdcftp.gfz-potsdam.de/grace/. Low degree corrections (degree-1, C20, C30) have been made. The code to generate the dataset is located in the “code_share/“ folder, with an example for C30. The model-based Greenland mass balance result for data validation (results given in the paper) is provided in the "Greenland_SMB-D.txt” file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standard deviations are presented in parentheses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A growing body of work in psycholinguistics suggests that morphological relations between word forms affect the processing of complex words. Previous studies have usually focused on a particular type of paradigmatic relation, for example the relation between paradigm members, or the relation between alternative forms filling a particular paradigm cell. However, potential interactions between different types of paradigmatic relations have remained relatively unexplored. The data in in this data set were used in two corpus studies of variable plurals in Dutch to test hypotheses about potentially interacting paradigmatic effects. The first study (which uses the s_dist data) shows that generalization across noun paradigms predicts the distribution of plural variants, and that this effect is diminished for paradigms in which the plural variants are more likely to have a strong representation in the mental lexicon. The second study (which uses the s_dur data) demonstrates that the pronunciation of a target plural variant is affected by coactivation of the alternative variant, resulting in shorter segmental durations. This effect is dependent on the representational strength of the alternative plural variant. In sum, the distributional and durational measurements in these data provide evidence that storage of morphologically complex words may affect the role of generalization and coactivation during production. A full description of the data gathering process and the analyses is given in the Methodology file. The Readme file describes how the remaining files relate to the research.
Facebook
Twitterhttps://dataverse.no/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.18710/5KCE4Uhttps://dataverse.no/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.18710/5KCE4U
Dataset description This dataset, which is adapted from Jenset and McGillivray (2017), contains tabular files documenting the alternating usage of -(e)th and -(e)s to mark third-person verb inflection in Early Modern English. The data provided by Jenset and McGillivray (2017) are drawn from the PPCEME corpus (Kroch et al. 2004) and cover the period from 1500 to 1700. In total, 13,757 third-person singular tokens (excluding the verb BE) were annotated by these authors for a range of variables. For the purposes of the present methodological study, this dataset was reduced to a subset of 11,645 tokens, and the coding of variables was in some parts revised, completed, or modified. The dataset includes information about the Author and Verb Lemma, as well as a number of predictor variables, including Genre, Year, Frequency (of the verb lemma in the third-person singular), Phonological Context (stem-final sound), and the Gender of the author. Abstract for related publication Resource constraints often force researchers to down-size the list of tokens returned by a corpus query. This paper sketches a methodology for down-sampling and offers a survey of current practices. We build on earlier work and extend the evaluation of down-sampling designs to settings where tokens are clustered by text file and lexeme. Our case study deals with third-person present-tense verb inflection in Early Modern English and focuses on five predictors: Year, Gender, Genre, Frequency, and Phonological Context. We evaluate two strategies for selecting 2,000 (out of 11,645) tokens: simple down-sampling, where each hit has the same selection probability; and structured down-sampling, where this probability is inversely proportional to the author- and verb-specific token count. We form 500 sub-samples using each scheme and compare regression results to a reference model fit to the full set of cases. We observe that structured down-sampling shows better performance on several evaluation criteria.
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Fortran and Matlab programs, Matlab mex file of Fortran program, compiled mex file, and sample data files, etc. for computing a partial elastic shape registration of two simple surfaces in 3-dimensional space and the elastic shape distance between them corresponding to the partial registration.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is at https://github.com/peterrobinson/CTSpellingArticle2024. It is contained in three folders, each folder corresponding to one of the three sets of data used in this analysis, as follows:
“sorted by regularization”. This folder contains spelling data and results derived from the regularization process, where (for example) spellings of forms regularized to “goode” are distinguished from spellings of forms regularized to “god”;
“sorted by part of speech”. This folder contains spelling data and results derived from a lemmatization and part-of-speech identification process, where (for example) spellings of forms lemmatized to “goode” singular adjective are distinguished from spellings of forms regularized to “goode” plural adjective, and forms lemmatized to “gode” singular noun nominative case are distinguished from “gode” singular noun oblique case (as in “to gode”).
“all spellings unsorted”. This folder contains spelling data as undifferentiated counts of “bags of words”: for each witness: so many occurrences of “good”, so many of “goode”, so many of “god”, so many of “gode”.
Each folder contains the following files (under various names):
A . json file holding all the data, structured according to its categorization. The “sorted by part of speech” folder contains two .json files, one with spellings organized by headword lemma, the other organized by part-of-speech;
Two .nex Nexus files containing all the data. In the “sorted by regularization” and “sorted by part of speech” folders one Nexus file groups spellings by variant sites within each line, the other Nexus file groups spellings by words within each line. In the “all spellings unsorted” folder one Nexus file contains all the spellings organized by spelling; the second holds a Nexus distance matrix with distances created according to the Manhattan distance algorithm;
A .dst distance matrix file, containing a distance matrix constructed with distacnes calculated by the Manhattan distance algorithm;
A “features” file, containing a spreadsheet ranking each variant site according to its impact on the analysis
Multiple .pdf files visualizing the results of our analysis, with the names reflecting the analysis each contains. Files with names including “Splits” were created using the SplitsTree algorithm and software (Huson and Bryant 2006; “SplitsTree | Universität Tübingen,” n.d.)
The “sorted by regularization” folder also contains a single image file, “tiagoplot1.jpg”, visualizing the results of PCA analysis on the “sorted by regularization” data.
Facebook
Twitterhttps://dataverse.no/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.18710/4D2QIIhttps://dataverse.no/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.18710/4D2QII
Data and R code are provided for statistical analysis of approximately 39,000 corpus examples of predicate agreement in constructions with quantified subjects in Russian. The analysis indicates that these constructions constitute a network of constructions (“allostructions”) with various preferences for singular or plural agreement. Factors pull in different directions, and we observe a relatively stable situation in the face of variation. We present an analysis of a multidimensional network of allostructions in Russian, thus contributing to our understanding of allostructional relationships in Construction Grammar. With regard to historical linguistics, language stability is an understudied field. We illustrate an interplay of divergent factors that apparently resists language change. The syntax of numerals and other quantifiers represents a notoriously complex phenomenon of the Russian language. Our study sheds new light on the contributions of factors that favor singular or plural agreement in sentences with quantified subjects.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standard deviations are shown in parentheses.
Facebook
TwitterWe consider a nonlinear Dirichlet problem driven by the (p, q)-Laplacian and with a reaction having the combined effects of a singular term and of a parametric (p−1)-superlinear perturbation. We prove a bifurcation-type result describing the changes in the set of positive solutions as the parameter λ>0 varies. Moreover, we prove the existence of a minimal positive solution u∗λ and study the monotonicity and continuity properties of the map λ→u∗λ.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Determiners with congruent gender facilitate the recognition of the following noun. We examine two explanations of this effect: either gender information is retrieved and influences lexical access, or gender effects are due to the determiner-noun co-occurrence. French nouns are either feminine or masculine and are preceded by feminine or masculine determiners in the singular. Plural articles are unmarked for gender. Because some nouns (peanuts) occur more frequently in the plural than in their singular, they frequently co-occur with determiners that do not provide gender information. Conversely, nouns that occur more frequently in their singular form (cathedral) co-occur more frequently with gender-marked determiners. We examined the recognition of plural- and singular-oriented nouns preceded by gender-marked and unmarked determiners. Singular-oriented nouns were recognised faster after gender-marked (singular) articles than after gender-unmarked (plural) ones. However, plural-oriented nouns were recognised faster after gender-unmarked (plural) articles, suggesting that articles/nouns co-occurrence outweigh abstract gender cue.
Facebook
TwitterIn this paper we suggest a new algorithm for the computation of a best rank one approximation of tensors, called 'alternating singular value decomposition'. This method is based on the computation of maximal singular values and the corresponding singular vectors of matrices. We also introduce a modification for this method and the alternating least squares method, which ensures that alternating iterations will always converge to a semi-maximal point. Finally, we introduce a new simple Newton-type method for speeding up the convergence of alternating methods near the optimum. We present several numerical examples that illustrate the computational performance of the new method in comparison to the alternating least square method.
Facebook
TwitterMany researchers seem to think that construction grammar posits the existence of just wholly idiosyncratic constructions or form-meaning pairings. However, this idea demonstrates a deep misunderstanding of the approach, since constructions rarely emerge sui generis. Rather, construction grammar aims to balance the fact that some linguistic uses cannot be fully predicted from other well-established uses, with the fact that extensions of a construction, while not predictable, are motivated by other senses in the constructional network. This study illustrates this tenet of constructional approaches to language by providing an analysis of the Spanish completive reflexive marker se. In order to identify the different senses of the completive se-construction I used data from the Spanish corpus CREA (Corpus de Referencia del Español Actual, http://corpus.rae.es/creanet.html). Given the large size of the corpus (200 million words), the frequency search—which is merely indicative—was arbitrarily limited to constructions in which the verb appeared in 3rd person singular and was directly followed by a direct object headed by the determined articles el ‘the’ (masculine) or la ‘the’ (feminine) in singular. The data set includes all the instances of the completive reflexive found in the sample described above.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summaries of posterior distributions for singular values and variance components for the BGGE and BGGEE models.
Facebook
TwitterThis is tomography data as acquired using a commercial X-ray tomography instrument. We obtained reconstructions of a graded-index optical fiber with voxels of edge length 1.05 µm at 12 tube voltages. The fiber manufacturer created a graded index in the central region by varying the germanium concentration from a peak value in the center of the core to a very small value at the core-cladding boundary. Operating on 12 tube voltages, we show by a singular value decomposition that there are only two singular vectors with significant weight. Physically, this means scans beyond two tube voltages contain largely redundant information. We concentrate on an analysis of the images associated with these two singular vectors. The first singular vector is dominant and images of the coefficients of the first singular vector at each voxel look are similar to any of the single-energy reconstructions. Images of the coefficients of the second singular vector by itself appear to be noise. However, by averaging the reconstructed voxels in each of several narrow bands of radii, we can obtain values of the second singular vector at each radius. In the core region, where we expect the germanium doping to go from a peak value at the fiber center to zero at the core-cladding boundary, we find that a plot of the two coefficients of the singular vectors forms a line in the two-dimensional space consistent with the dopant decreasing linearly with radial distance from the core center. The coating, made of a polymer rather than silica, is not on this line indicating that the two-dimensional results are sensitive not only to the density but also to the elemental composition. A stack of reconstructions are given here as tiff files of individual slices. Each zip file corresponds to a tilt series at a given tube voltage, given in the file name. The power is also given in the file name. (For example, file “30kV-2W.zip” was tube voltage at 30kV, power 2W.) The power was varied so that the signal-to-noise was approximately equal for the various reconstructions. The experiment is described in: ZH Levine, AP Peskin, EJ Garboczi, and AD Holmgren, Multi-Energy X-Ray Tomography of an Optical Fiber: The Role of Spatial Averaging, Microscopy and Microanalysis 25 (1) 70-76 (2019). https://doi.org/10.1017/S1431927618016136
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This site contains the widefield imaging datasets from the publication Cortical State Fluctuations during Sensory Decision Making, by Jacobs et al in Current Biology.This data is from the behavioural tasks described in the publication, and is in a compressed SVD format (see Methods in the publication for more details). The companion code is designed to take the data in this format.The datasets provided here contain the top 500 singular values, which is how the data in the publication was analysed, as this was found to sufficiently capture the data. The data contaning up to 2000 singular values can be shared on request.The timestamps of the datasets here are not all aligned with the behavioural datasets; the companion code takes care of this.The data is organised by experimental subject; most subjects were recorded from on multiple days, which form subfolders within the subject folder. Within a day, there may have been several experiments, which again form subfolders within the day folder. The companion code expects this data organisation.The companion code is available at: https://github.com/eakjacobs/Jacobs_et_al_CurrentBiologyFor more information and links to the behavioural and pupil datasets, please follow this link: https://doi.org/10.6084/m9.figshare.13084805The research article can be found (freely available) at https://www.cell.com/current-biology/fulltext/S0960-9822(20)31437-8
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/ZWVNQFhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/ZWVNQF
A careful examination of the effects of collisions on resonant wave-particle interactions leads to an alternate interpretation and deeper understanding of the quasilinear operator originally formulated by Kennel and Engelmann (Phys. Fluids vol. 9, 1966, pp. 2377- 2388) for collisionless, magnetized plasmas, and widely used to model radio frequency heating and current drive. The resonant and nearly resonant particles are particularly sensitive to collisions that pitch angle scatter them out of and into resonance. As a result, the resonant particle-wave interactions occur in the center of a narrow collisional boundary when the collision frequency nu is very small compared to the wave frequency omega. The diffusive nature of the pitch angle scattering combined with the wave-particle resonance condition enhances the collision frequency by (omega/nu)2/3 >>1, resulting in an effective resonant particle collision time of tau_int ~ (nu /omega)2/3 nu <<1/ nu . A rigorous collisional boundary layer analysis generalizes the standard quasilinear operator to a form that is fully consistent with Kennel-Englemann, but allows replacing the delta function appearing in the diffusivity with a simple integral (having the appropriate delta function limit) retaining the new physics associated with the narrow boundary layer, while preserving the entropy production principle. The limitations of the collisional boundary layer treatment are also estimated, and indicate that substantial departures from Maxwellian are not permitted.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset accompanies a paper to be published in "Morphology" (JOMO, Springer). Under the present DOI, all data generated for this research as well as all scripts used are stored. The paper itself is not CC-licensed, refer to Springer's "Morphology" website for details!AbstractIn this paper, we take a closer theoretical and empirical look at the linking elements in German N1+N2 compounds which are identical to the plural marker of N1 (such as -er with umlaut, as in Häus-er-meer 'sea of houses'). Various perspectives on the actual extent of plural interpretability of these pluralic linking elements are expressed in the literature. We aim to clarify this question by empirically examining to what extent there may be a relationship between plural form and meaning which informs in which sorts of compounds pluralic linking elements appear. Specifically, we investigate whether pluralic linking elements occur especially frequently in compounds where a plural meaning of the first constituent is induced either externally (through plural inflection of the entire compound) or internally (through a relation between the constituents such that N2 forces N1 to be conceptually plural, as in the example above). The results of a corpus study using the DECOW16A corpus and a split-100 experiment show that in the internal but not external plural meaning conditions, a pluralic linking element is preferred over a non-pluralic one, though there is considerable inter-speaker variability, and limitations imposed by other constraints on linking element distribution also play a role. However, we show the overall tendency that German language users do use pluralic linking elements as cues to the plural interpretation of N1+N2 compounds. Our interpretation does not reference a specific morphological framework. Instead, we view our data as strengthening the general approach of probabilistic morphology.