100+ datasets found
  1. f

    What is your definition of Big Data? Researchers’ understanding of the...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maddalena Favaretto; Eva De Clercq; Christophe Olivier Schneble; Bernice Simone Elger (2023). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade [Dataset]. http://doi.org/10.1371/journal.pone.0228987
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Maddalena Favaretto; Eva De Clercq; Christophe Olivier Schneble; Bernice Simone Elger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The term Big Data is commonly used to describe a range of different concepts: from the collection and aggregation of vast amounts of data, to a plethora of advanced digital techniques designed to reveal patterns related to human behavior. In spite of its widespread use, the term is still loaded with conceptual vagueness. The aim of this study is to examine the understanding of the meaning of Big Data from the perspectives of researchers in the fields of psychology and sociology in order to examine whether researchers consider currently existing definitions to be adequate and investigate if a standard discipline centric definition is possible.MethodsThirty-nine interviews were performed with Swiss and American researchers involved in Big Data research in relevant fields. The interviews were analyzed using thematic coding.ResultsNo univocal definition of Big Data was found among the respondents and many participants admitted uncertainty towards giving a definition of Big Data. A few participants described Big Data with the traditional “Vs” definition—although they could not agree on the number of Vs. However, most of the researchers preferred a more practical definition, linking it to processes such as data collection and data processing.ConclusionThe study identified an overall uncertainty or uneasiness among researchers towards the use of the term Big Data which might derive from the tendency to recognize Big Data as a shifting and evolving cultural phenomenon. Moreover, the currently enacted use of the term as a hyped-up buzzword might further aggravate the conceptual vagueness of Big Data.

  2. l

    LScD (Leicester Scientific Dictionary)

    • figshare.le.ac.uk
    docx
    Updated Apr 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
    Explore at:
    docxAvailable download formats
    Dataset updated
    Apr 15, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Leicester
    Description

    LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.

  3. f

    Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Laura Miron; Rafael Gonçalves; Mark A. Musen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.

  4. d

    Data from: Digital analysis of cDNA abundance; expression profiling by means...

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Digital analysis of cDNA abundance; expression profiling by means of restriction fragment fingerprinting [Dataset]. https://catalog.data.gov/dataset/digital-analysis-of-cdna-abundance-expression-profiling-by-means-of-restriction-fragment-f
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Gene expression profiling among different tissues is of paramount interest in various areas of biomedical research. We have developed a novel method (DADA, Digital Analysis of cDNA Abundance), that calculates the relative abundance of genes in cDNA libraries. Results DADA is based upon multiple restriction fragment length analysis of pools of clones from cDNA libraries and the identification of gene-specific restriction fingerprints in the resulting complex fragment mixtures. A specific cDNA cloning vector had to be constructed that governed missing or incomplete cDNA inserts which would generate misleading fingerprints in standard cloning vectors. Double stranded cDNA was synthesized using an anchored oligo dT primer, uni-directionally inserted into the DADA vector and cDNA libraries were constructed in E. coli. The cDNA fingerprints were generated in a PCR-free procedure that allows for parallel plasmid preparation, labeling, restriction digest and fragment separation of pools of 96 colonies each. This multiplexing significantly enhanced the throughput in comparison to sequence-based methods (e.g. EST approach). The data of the fragment mixtures were integrated into a relational database system and queried with fingerprints experimentally produced by analyzing single colonies. Due to limited predictability of the position of DNA fragments on the polyacrylamid gels of a given size, fingerprints derived solely from cDNA sequences were not accurate enough to be used for the analysis. We applied DADA to the analysis of gene expression profiles in a model for impaired wound healing (treatment of mice with dexamethasone). Conclusions The method proved to be capable of identifying pharmacologically relevant target genes that had not been identified by other standard methods routinely used to find differentially expressed genes. Due to the above mentioned limited predictability of the fingerprints, the method was yet tested only with a limited number of experimentally determined fingerprints and was able to detect differences in gene expression of transcripts representing 0.05% of the total mRNA population (e.g. medium abundant gene transcripts).

  5. E

    Data from 'Language learners privilege structured meaning over surface...

    • dtechtive.com
    • find.data.gov.scot
    csv, txt
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. School of Philosophy, Psychology and Language Sciences (2017). Data from 'Language learners privilege structured meaning over surface frequency' [Dataset]. http://doi.org/10.7488/ds/2120
    Explore at:
    csv(0.0073 MB), csv(0.0071 MB), csv(0.007 MB), csv(0.0072 MB), txt(0.0166 MB), csv(0.0079 MB), csv(0.0074 MB), csv(0.0075 MB), txt(0.1036 MB), csv(0.0076 MB)Available download formats
    Dataset updated
    Aug 23, 2017
    Dataset provided by
    University of Edinburgh. School of Philosophy, Psychology and Language Sciences
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Although it is widely agreed that learning the syntax of natural languages involves acquiring structure-dependent rules, recent work on acquisition has nevertheless attempted to characterize the outcome of learning primarily in terms of statistical generalizations about surface distributional information. In this paper we investigate whether surface statistical knowledge or structural knowledge of English is used to infer properties of a novel language under conditions of impoverished input. We expose learners to artificial-language patterns that are equally consistent with two possible underlying grammars--one more similar to English in terms of the linear ordering of words, the other more similar on abstract structural grounds. We show that learners' grammatical inferences overwhelmingly favor structural similarity over preservation of superficial order. Importantly, the relevant shared structure can be characterized in terms of a universal preference for isomorphism in the mapping from meanings to utterances. Whereas previous empirical support for this universal has been based entirely on data from cross-linguistic language samples, our results suggest it may reflect a deep property of the human cognitive system--a property that, together with other structure-sensitive principles, constrains the acquisition of linguistic knowledge.

  6. Expression data from SPHINX (SPaceflight of Huvec: an INtegrated eXperiment)...

    • catalog.data.gov
    • data.nasa.gov
    • +2more
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). Expression data from SPHINX (SPaceflight of Huvec: an INtegrated eXperiment) [Dataset]. https://catalog.data.gov/dataset/expression-data-from-sphinx-spaceflight-of-huvec-an-integrated-experiment-cca1a
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Changes in the physical environment modulate cell responses and may lead to the impairment or even failure of tissue function as a result of mechanotransduction processes. It has been suggested that this situation occurs in some age-related diseases and some pathological conditions observed in space such as cardiovascular deconditioning bone loss muscle atrophy and impaired immune responses. All of these are associated with endothelial dysfunction but the precise mechanism is still unclear. We used the microarray approach to obtain insights into the mechanism responsible for endothelial dysfunction by taking advantage of the challenging environment of gravitational unloading onboard the International Space Station. The effects of gravitational unloading on HUVEC gene expression were investigated by means of cDNA microarray analyses of six randomly chosen samples (three for each of the two conditions of spaceflight and 1g) using Affymetrix Gene Human 1.0 ST Arrays

  7. Data from: Source Strength Functions from Long-Term Monitoring Data and...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Source Strength Functions from Long-Term Monitoring Data and Spatially Distributed Mass Discharge Measurements [Dataset]. https://catalog.data.gov/dataset/source-strength-functions-from-long-term-monitoring-data-and-spatially-distributed-mass-di
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Source strength functions (SSF), defined as contaminant mass discharge or flux-averaged concentration from dense nonaqueous phase liquid (DNAPL) source zones as a function of time, provide a quantitative model of DNAPL source-zone behavior. Such information is useful for making site management decisions. We investigate the use of historic data collected during long-term monitoring (LTM) activities at a site in Rhode Island to predict the SSF based on temporal mass discharge measurements at a fixed location, as well as SSF estimation using mass discharge measurements at a fixed time from three spatially distributed control planes. Mass discharge based on LTM data decreased from ~300 g/day in 1996 to ~70 g/day in 2012 at a control plane downgradient of the suspected DNAPL source zone, and indicates an overall decline of ~80% in 16 years. These measurements were compared to current mass discharge measurements across three spatially distributed control planes. Results indicate that mass discharge increased in the downgradient direction, and was ~6 g/day, ~37 g/day, and ~400 g/day at near, intermediate, and far distances from the suspected source zone, respectively. This behavior was expected given the decreasing trend observed in the LTM data at a fixed location. These two data sets were compared using travel time as a means to plot the data sets on a common axis. The similarity between the two data sets gives greater confidence to the use of this combined data set for site-specific SSF estimation relative to either the sole use of LTM or spatially distributed data sets. This dataset is associated with the following publication: Brooks, M.C., A.L. Wood, J. Cho, C.A.P. Williams, B. Brandon, and M.D. Annable. Source strength functions from long-term monitoring data and spatially distributed mass discharge measurements. JOURNAL OF CONTAMINANT HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 219: 28-39, (2018).

  8. i

    Data from: Supplementary data for the research paper "Haploinsufficiency of...

    • research-explorer.ista.ac.at
    • b2find.eudat.eu
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dotter, Christoph; Novarino, Gaia (2025). Supplementary data for the research paper "Haploinsufficiency of the intellectual disability gene SETD5 disturbs developmental gene expression and cognition" [Dataset]. https://research-explorer.ista.ac.at/record/6074
    Explore at:
    Dataset updated
    Apr 15, 2025
    Authors
    Dotter, Christoph; Novarino, Gaia
    Description

    This dataset contains the supplementary data for the research paper "Haploinsufficiency of the intellectual disability gene SETD5 disturbs developmental gene expression and cognition".

    The contained files have the following content: 'Supplementary Figures.pdf' Additional figures (as referenced in the paper). 'Supplementary Table 1. Statistics.xlsx' Details on statistical tests performed in the paper. 'Supplementary Table 2. Differentially expressed gene analysis.xlsx' Results for the differential gene expression analysis for embryonic (E9.5; analysis with edgeR) and in vitro (ESCs, EBs, NPCs; analysis with DESeq2) samples. 'Supplementary Table 3. Gene Ontology (GO) term enrichment analysis.xlsx' Results for the GO term enrichment analysis for differentially expressed genes in embryonic (GO E9.5) and in vitro (GO ESC, GO EBs, GO NPCs) samples. Differentially expressed genes for in vitro samples were split into upregulated and downregulated genes (up/down) and the analysis was performed on each subset (e.g. GO ESC up / GO ESC down). 'Supplementary Table 4. Differentially expressed gene analysis for CFC samples.xlsx' Results for the differential gene expression analysis for samples from adult mice before (HC - Homecage) and 1h and 3h after contextual fear conditioning (1h and 3h, respectively). Each sheet shows the results for a different comparison. Sheets 1-3 show results for comparisons between timepoints for wild type (WT) samples only and sheets 4-6 for the same comparisons in mutant (Het) samples. Sheets 7-9 show results for comparisons between genotypes at each time point and sheet 10 contains the results for the analysis of differential expression trajectories between wild type and mutant. 'Supplementary Table 5. Cluster identification.xlsx' Results for k-means clustering of genes by expression. Sheet 1 shows clustering of just the genes with significantly different expression trajectories between genotypes. Sheet 2 shows clustering of all genes that are significantly differentially expressed in any of the comparisons (includes also genes with same trajectories). 'Supplementary Table 6. GO term cluster analysis.xlsx' Results for the GO term enrichment analysis and EWCE analysis for enrichment of cell type specific genes for each cluster identified by clustering genes with different expression trajectories (see Table S5, sheet 1). 'Supplementary Table 7. Setd5 mass spectrometry results.xlsx' Results showing proteins interacting with Setd5 as identified by mass spectrometry. Sheet 1 shows protein protein interaction data generated from these results (combined with data from the STRING database. Sheet 2 shows the results of the statistical analysis with limma. 'Supplementary Table 8. PolII ChIP-seq analysis.xlsx' Results for the Chip-Seq analysis for binding of RNA polymerase II (PolII). Sheet 1 shows results for differential binding of PolII at the transcription start site (TSS) between genotypes and sheets 2+3 show the corresponding GO enrichment analysis for these differentially bound genes. Sheet 4 shows RNAseq counts for genes with increased binding of PolII at the TSS.

  9. d

    Korea Gas Technology Corporation_Homepage System Standard Glossary

    • data.go.kr
    csv
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Korea Gas Technology Corporation_Homepage System Standard Glossary [Dataset]. https://www.data.go.kr/en/data/15103153/fileData.do
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 21, 2025
    License

    https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do

    Description

    This file is a CSV format data that organizes the standard terminology dictionary used in the homepage system. It contains a total of 363 terms. Term name: The name of the term used in the system. Physical name: The physical field name used when implementing a system such as a database. Domain: Indicates the logical data category to which the term belongs. Info type: The type of information, providing data classification criteria. Data type: Specifies the data storage format (e.g. VARCHAR, etc.) of the term. Code name: Indicates the name when managed as a code value, and is mostly blank. Definition: A definition that explains the meaning of the term. Personal information type: Specifies whether the item corresponds to personal information. Public/private status: This item distinguishes the possibility of information being disclosed. This data can be used to unify terms between systems, standardize data, and establish personal information protection and information disclosure standards.

  10. u

    Data from: Experimental Evidence for the Influence of Structure and Meaning...

    • beta.ukdataservice.ac.uk
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UK Data Service (2023). Experimental Evidence for the Influence of Structure and Meaning on Linear Order in the Noun Phrase, 2017-2022 [Dataset]. http://doi.org/10.5255/ukda-sn-856721
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    datacite
    Description

    Recent work has used artificial language experiments to argue that hierarchical representations drive learners’ expectations about word order in complex noun phrases like these two green cars (Culbertson & Adger 2014; Martin, Ratitamkul, et al. 2019). When trained on a novel language in which individual modifiers come after the Noun, English speakers overwhelmingly assume that multiple nominal modifiers should be ordered such that Adjectives come closest to the Noun, then Numerals, then Demonstratives (i.e., N-Adj-Num-Dem or some subset thereof). This order transparently reflects a constituent structure in which Adjectives combine with Nouns to the exclusion of Numerals and Demonstratives, and Numerals combine with Noun+Adjective units to the exclusion of Demonstratives. This structure has also been claimed to derive frequency asymmetries in complex noun phrase order across languages (e.g., Cinque 2005). However, we show that features of the methodology used in these experiments potentially encourage participants to use a particular metalinguistic strategy that could yield this outcome without implicating constituency structure. Here, we use a more naturalistic artificial language learning task to investigate whether the preference for hierarchy-respecting orders is still found when participants do not use this strategy. We find that the preference still holds, and, moreover, as Culbertson & Adger (2014) speculate, that its strength reflects structural distance between modifiers. It is strongest when ordering Adjectives relative to Demonstratives, and weaker when ordering Numerals relative to Adjectives or Demonstratives relative to Numerals. Our results provide the strongest evidence yet for the psychological influence of hierarchical structure on word order preferences during learning.

  11. d

    e-Sbirka: Data set: CzechVOC code list - term definition

    • data.gov.cz
    json, json-ld
    Updated Jan 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministerstvo vnitra (2024). e-Sbirka: Data set: CzechVOC code list - term definition [Dataset]. https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A9-sady%2F00007064%2F1296385456
    Explore at:
    json, json-ldAvailable download formats
    Dataset updated
    Jan 1, 2024
    Dataset authored and provided by
    Ministerstvo vnitra
    Description

    Číselník typů definic termínů CzechVOCu. Nový režim odděleného spuštění e-Sbírky do ostrého provozu a současného dokončování e-Legislativy, jejího ověřovacího provozu a postupného uvádění do praxe, předpokládá úpravy systému e-Sbírka a e-Legislativa, a tudíž i nová nasazování datové báze v období od 1. 1. 2024 do 15. 1. 2025. Důsledkem těchto úprav je mj. i to, že do 15. 1. 2025 se mohou měnit identifikátory jednotlivých fragmentů tvořících strukturovaná znění aktů e-Sbírky a může dojít k dílčím úpravám struktury dat. Produkční napojení externích služeb využívajících Otevřená data (Open Data) tak doporučujeme realizovat až po 15. 1. 2025.

  12. Opinion on the meaning of the term 'fake news' in Serbia 2018

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Opinion on the meaning of the term 'fake news' in Serbia 2018 [Dataset]. https://www.statista.com/statistics/913764/opinion-on-definition-of-fake-news-in-serbia/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 22, 2018 - Jul 6, 2018
    Area covered
    Serbia
    Description

    This statistic illustrates the results of a survey regarding the opinion on the meaning of the term fake news in Serbia in 2018. According to data published by IPSOS, ** percent of Serbian adults stated that they personally thought of politicians using the term to support their side of the argument.

  13. Data from: Long-term compost use and high frequency low concentration...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Long-term compost use and high frequency low concentration fertigation reduce N2O emissions from a California almond orchard [Dataset]. https://catalog.data.gov/dataset/data-from-long-term-compost-use-and-high-frequency-low-concentration-fertigation-reduce-ns
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Using compost as an agricultural amendment offers a means to reduce organic waste, as mandated by California State Bill 1383. Compost application, through the addition of soil organic matter, leads to improvements in soil physical characteristics and soil organic carbon content. Effects of compost application (7-year duration; 38-dry tonnes ha-1) on soil nitrous oxide (N2O), inorganic nitrogen pools, soil temperature and water content, bulk density and total carbon and N content were examined. Soils were also measured for soil pH, electrical conductivity, and total C and N. These findings were compared to the control, which had not received compost application.The research site was located ∼8 km west of Modesto, California (37˚37′38.17“ N 121˚5′21.57”W), on a 10.5-ha almond orchard (Prunus dulcis, 270 m by 395 m). The orchard was replanted in 2012 with Nonpareil cultivars and interplanted with Aldrich and Carmel cultivars, all grafted on Nemaguard peach rootstock [Prunus persica (L.) Bratsch]. Trees were 4.3 m apart along the row, with 6.4 m between rows, and irrigated by surface drip hose with embedded emitters every 3.7 m (0.07 L min−1).The two treatments, No Compost and Compost (n=3 replicates per treatment), were studied in the growing season (December 2018–August 2019). All other management was consistent between treatments, representing standard practices for the almond industry in this region. The orchard began HFLC nutrient management in 2018, and the total amounts of fertilizer N and irrigation were adjusted in response to anticipated tree demand as determined by the grower. In 2019, orchard received ∼195 kg N ha−1 over 14 fertigation events (March–July, 2019) through a drip irrigation system using HFLC. These 14 fertilization events ranged from 4.5 kg N ha−1 to 28.0 kg ha−1.Findings were compared graphically against other data from 5 other studies examining the effects of irrigation and fertigation practices on N2O. Total cumulative emissions were calculated by date and treatment over the growing season. The effects of sampling date, treatment and area spanning the drip zone were analyzed for N2O, soil temperature and water content, water-filled pore space, ammonium and nitrate.

  14. g

    Short-term PM2.5 exposure and early-readmission risk in Heart Failure...

    • gimi9.com
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Short-term PM2.5 exposure and early-readmission risk in Heart Failure Patients | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_short-term-pm2-5-exposure-and-early-readmission-risk-in-heart-failure-patients
    Explore at:
    Dataset updated
    Nov 15, 2024
    Description

    In this manuscript EPA researchers used high resolution (1x1 km) modeled air quality data from a model built by Harvard collaborators to estimate the association between short-term exposure to air pollution and the occurrence of 30-day readmissions in a heart failure population. The heart failure population was taken from patients presenting to a University of North Carolina Healthcare System (UNCHCS) affiliated hospital or clinic that reported electronic health records to the Carolina Data Warehouse for Health (CDW-H). A description of the variables used in this analysis are available in the data dictionary (L:/PRIV/EPHD_CRB/Cavin/CARES/Data Dictonaries/HF short term PM25 and readmissions data dictionary.xlsx) associated with this manuscript. Analysis code is available in L:/PRIV/EPHD_CRB/Cavin/CARES/Project Analytic Code/Lauren Wyatt/DailyPM_HF_readmission. This dataset is not publicly accessible because: Dataset is PII in the form of electronic health records. It can be accessed through the following means: Data can be accessed with an approved IRB. Format: In this manuscript EPA researchers used high resolution (1x1 km) modeled air quality data from a model built by Harvard collaborators to estimate the association between short-term exposure to air pollution and the occurrence of 30-day readmissions in a heart failure population. The heart failure population was taken from patients presenting to a University of North Carolina Healthcare System (UNCHCS) affiliated hospital or clinic that reported electronic health records to the Carolina Data Warehouse for Health (CDW-H). A description of the variables used in this analysis are available in the data dictionary (L:/PRIV/EPHD_CRB/Cavin/CARES/Data Dictonaries/HF short term PM25 and readmissions data dictionary.xlsx) associated with this manuscript. Analysis code is available in L:/PRIV/EPHD_CRB/Cavin/CARES/Project Analytic Code/Lauren Wyatt/DailyPM_HF_readmission. This dataset is associated with the following publication: Wyatt, L., A. Weaver, J. Moyer, J. Schwartz, Q. Di, D. Diazsanchez, W. Cascio, and C. Ward-Caviness. Short-term PM2.5 exposure and early-readmission risk: A retrospective cohort study in North Carolina Heart Failure Patients. American Heart Journal. Mosby Year Book Incorporated, Orlando, FL, USA, 248: 130-138, (2022).

  15. d

    Data from: Long-Term Agroecosystem Research in the Central Mississippi River...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Long-Term Agroecosystem Research in the Central Mississippi River Basin: Goodwater Creek Experimental Watershed and Regional Herbicide Water Quality Data [Dataset]. https://catalog.data.gov/dataset/data-from-long-term-agroecosystem-research-in-the-central-mississippi-river-basin-goodwate-a5df5
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Area covered
    Mississippi River, Mississippi River System
    Description

    The GCEW herbicide data were collected from 1991-2010, and are documented at plot, field, and watershed scales. Atrazine concentrations in Goodwater Creek Experimental Watershed (GCEW) were shown to be among the highest of any watershed in the United States based on comparisons using the national Watershed Regressions for Pesticides (WARP) model and by direct comparison with the 112 watersheds used in the development of WARP. This 20-yr-long effort was augmented with a spatially broad effort within the Central Mississippi River Basin encompassing 12 related claypan watersheds in the Salt River Basin, two cave streams on the fringe of the Central Claypan Areas in the Bonne Femme watershed, and 95 streams in northern Missouri and southern Iowa. The research effort on herbicide transport has highlighted the importance of restrictive soil layers with smectitic mineralogy to the risk of transport vulnerability. Near-surface soil features, such as claypans and argillic horizons, result in greater herbicide transport than soils with high saturated hydraulic conductivities and low smectitic clay content. The data set contains concentration, load, and daily discharge data for Devils Icebox Cave and Hunters Cave from 1999 to 2002. The data are available in Microsoft Excel 2010 format. Sheet 1 (Cave Streams Metadata) contains supporting information regarding the length of record, site locations, parameters measured, parameter units, method detection limits, describes the meaning of zero and blank cells, and briefly describes unit area load computations. Sheet 2 (Devils Icebox Concentration Data) contains concentration data from all samples collected from 1999 to 2002 at the Devils Icebox site for 12 analytes and two computed nutrient parameters. Sheet 3 (Devils Icebox SS Conc Data) contains 15-minute suspended sediment (SS) concentrations estimated from turbidity sensor data for the Devils Icebox site. Sheet 4 (Devils Icebox Load & Discharge Data) contains daily data for discharge, load, and unit area loads for the Devils Icebox site. Sheet 5 (Hunters Cave Concentration Data) contains concentration data from all samples collected from 1999 to 2002 at the Hunters Cave site for 12 analytes and two computed nutrient parameters. Sheet 6 (Hunters Cave SS Conc Data) contains 15-minute SS concentrations estimated from turbidity sensor data for the Hunters Cave site. Sheet 7 (Hunters Cave Load & Discharge Data) contains daily data for discharge, load, and unit area loads for the Hunters Cave site. [Note: To support automated data access and processing, each worksheet has been extracted as a separate, machine-readable CSV file; see Data Dictionary for descriptions of variables and their concentration units.] Resources in this dataset:Resource Title: README - Metadata. File Name: LTAR_GCEW_herbicidewater_qual.xlsxResource Description: Defines Water Quality and Sediment Load/Discharge parameters, abbreviations, time-frames, and units as rendered in the Excel file. For additional information including site information, method detection limits, and methods citations, see Metadata tab. For Definitions used in machine-readable CSV files, see Data Dictionary.Resource Title: Excel data spreadsheet. File Name: c3.jeq2013.12.0516.ds1_.xlsxResource Description: Multi-page data spreadsheet containing data as well as metadata from this study. A direct download of the data spreadsheet can be found here: https://dl.sciencesocieties.org/publications/datasets/jeq/C3.JEQ2013.12.0516.ds1/downloadResource Title: Devils Icebox Concentration Data. File Name: DevilsIceboxConcData.csvResource Description: Concentrations of herbicides, metabolites, and nutrients (extracted from the Excel tab into machine-readable CSV data).Resource Title: Devils Icebox Load and Discharge Data. File Name: DevilsIceboxLoad&Discharge.csvResource Description: Discharge and Unit Area Loads for herbicides, metabolites, and suspended sediments (extracted from Excel tab as machine-readable CSV data)Resource Title: Devils Icebox Suspended Sediment Concentration Data. File Name: DevilsIceboxSSConcData.csvResource Description: Suspended Sediment Concentration Data (extracted from Excel tab as machine-readable CSV data)Resource Title: Hunters Cave Load and Discharge Data. File Name: HuntersCaveLoad&Discharge.csvResource Description: Discharge and Unit Area Loads for herbicides, metabolites, and suspended sediments (extracted from Excel tab as machine-readable CSV data)Resource Title: Hunters Cave Suspended Sediment Concentration Data. File Name: HuntersCaveSSConc.csvResource Description: Suspended Sediment Concentration Data (extracted from Excel tab as machine-readable CSV data)Resource Title: Data Dictionary for machine-readable CSV files. File Name: LTAR_GCEW_herbicidewater_qual.csvResource Description: Defines Water Quality and Sediment Load/Discharge parameters, abbreviations, time-frames, and units as implemented in the extracted machine-readable CSV files.Resource Title: Hunters Cave Concentration Data. File Name: HuntersCaveConcData.csvResource Description: Concentrations of herbicides, metabolites, and nutrients (extracted from the Excel tab into machine-readable CSV data)

  16. Long-Term Agricultural Research (LTAR) network - Meteorological Collection

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Long-Term Agricultural Research (LTAR) network - Meteorological Collection [Dataset]. https://catalog.data.gov/dataset/long-term-agricultural-research-ltar-network-meteorological-collection-7d719
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The LTAR network maintains stations for standard meteorological measurements including, generally, air temperature and humidity, shortwave (solar) irradiance, longwave (thermal) radiation, wind speed and direction, barometric pressure, and precipitation. Many sites also have extensive comparable legacy datasets. The LTAR scientific community decided that these needed to be made available to the public using a single web source in a consistent manner. To that purpose, each site sent data on a regular schedule, as frequently as hourly, to the National Agricultural Library, which has developed a web service to provide the data to the public in tabular or graphical form. This archive of the LTAR legacy database exports contains meteorological data through April 30, 2021. For current meteorological data, visit the GeoEvent Meteorology Resources page, which provides tools and dashboards to view and access data from the 18 LTAR sites across the United States. Resources in this dataset:Resource Title: Meteorological data. File Name: ltar_archive_DB.zipResource Description: This is an export of the meteorological data collected by LTAR sites and ingested by the NAL LTAR application. This export consists of an SQL schema definition file for creating database tables and the data itself. The data is provided in two formats: SQL insert statements (.sql) and CSV files (.csv). Please use the format most convenient for you. Note that the SQL insert statements take much longer to run since each row is an individual insert. Description of zip files The ltararchive*.zip files contain database exports. The schema is a .sql file; the data is exported as both SQL inserts and CSV for convenience. There is a README in markdown and PDF in the zips. Contains the database export of the schema and data for the site, site_station, and met tables as SQL insert statements. ltar_archive_db_sql_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_sql_export_20210430.zip --> has data until 2021-04-30 Contains the database export of the schema and data for the site, site_station, and met tables as CSV. ltar_archive_db_csv_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_csv_export_20210430.zip --> has data until 2021-04-30 Contains the raw CSV files that were sent to NAL from the LTAR sites/stations. ltar_rawcsv_archive.zip --> has data until 2021-04-30

  17. w

    Data from: Meaning without words : philosophy and non-verbal communication

    • workwithdata.com
    Updated Jul 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2022). Meaning without words : philosophy and non-verbal communication [Dataset]. https://www.workwithdata.com/book/meaning-without-words-philosophy-non-verbal-communication-199339
    Explore at:
    Dataset updated
    Jul 14, 2022
    Dataset authored and provided by
    Work With Data
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Meaning without words : philosophy and non-verbal communication is a book by Peter Gilroy published 1 time between 1996 and 1996

  18. Contact Means of the Operating NGOs on the Short-term Food Assistance...

    • data.gov.hk
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk, Contact Means of the Operating NGOs on the Short-term Food Assistance Service Teams | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-swd-fcw-list-stfasps
    Explore at:
    Dataset provided by
    data.gov.hk
    Description

    The dataset provides list of NGOs operating Short-term Food Assistance Service Teams with its agency names, service area, and contact means.

  19. m

    Data from: Dataset for classifying English words into difficulty levels by...

    • data.mendeley.com
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nisar Kangoo (2023). Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students [Dataset]. http://doi.org/10.17632/p2wrs7hm4z.4
    Explore at:
    Dataset updated
    Oct 24, 2023
    Authors
    Nisar Kangoo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains English words in column B. Corresponding to each word the other columns contain its frequency(fre), length(len), parts of speech(PS), the number of undergraduate students which marked it difficult (difficult_ug) and the number of postgraduate students which marked it difficult (difficult_pg).The dataset has a total of 5368 unique words. The words marked as difficult by undergraduate students are 680; and those marked as difficult by postgraduate students are 151; all the remaining words, viz., 4537, are easy and hence are not marked as difficult either by undergraduate and postgraduate students. The word against which there is hyphen (-) in difficult_ug column means that this word is not present in the text circulated to undergraduate students. Likewise hyphen(-) in difficult_pg column means words not present in text circulated to postgraduate students. The data is collected from the students of Jammu and Kashmir (a Union Territory of India). Latitude and Longitude (32.2778° N, 75.3412° E) The description of files attached is as: The dataset_english CSV file is the original dataset containing English words, its length, frequency, Parts of speech, number of undergraduate and postgraduate students which marked the particular words as difficult.
    The dataset_numerical CSV file contains the original dataset along with string fields transformed into numerical. The English language difficulty level measurement -Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx files contains the questionnaire supplied to students of College and University to underline difficult words in the English text. IGNOU English.zip file contains the Indra Gandhi National Open University (IGNOU) English text books for graduation and post graduation students. The text for above questionnaires were taken from these IGNOU English text books.

  20. Opinion on the meaning of the term 'fake news' in Great Britain 2018

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Opinion on the meaning of the term 'fake news' in Great Britain 2018 [Dataset]. https://www.statista.com/statistics/912648/opinion-on-definition-of-fake-news-in-great-britain/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 22, 2018 - Jul 6, 2018
    Area covered
    United Kingdom
    Description

    This statistic illustrates the results of a survey regarding the opinion on the meaning of the term fake news in Great Britain in 2018. According to data published by IPSOS, ** percent of British adults stated that they personally thought of politicians and the media using the term to discredit news they did not agree with.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Maddalena Favaretto; Eva De Clercq; Christophe Olivier Schneble; Bernice Simone Elger (2023). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade [Dataset]. http://doi.org/10.1371/journal.pone.0228987

What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade

Explore at:
106 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Maddalena Favaretto; Eva De Clercq; Christophe Olivier Schneble; Bernice Simone Elger
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The term Big Data is commonly used to describe a range of different concepts: from the collection and aggregation of vast amounts of data, to a plethora of advanced digital techniques designed to reveal patterns related to human behavior. In spite of its widespread use, the term is still loaded with conceptual vagueness. The aim of this study is to examine the understanding of the meaning of Big Data from the perspectives of researchers in the fields of psychology and sociology in order to examine whether researchers consider currently existing definitions to be adequate and investigate if a standard discipline centric definition is possible.MethodsThirty-nine interviews were performed with Swiss and American researchers involved in Big Data research in relevant fields. The interviews were analyzed using thematic coding.ResultsNo univocal definition of Big Data was found among the respondents and many participants admitted uncertainty towards giving a definition of Big Data. A few participants described Big Data with the traditional “Vs” definition—although they could not agree on the number of Vs. However, most of the researchers preferred a more practical definition, linking it to processes such as data collection and data processing.ConclusionThe study identified an overall uncertainty or uneasiness among researchers towards the use of the term Big Data which might derive from the tendency to recognize Big Data as a shifting and evolving cultural phenomenon. Moreover, the currently enacted use of the term as a hyped-up buzzword might further aggravate the conceptual vagueness of Big Data.

Search
Clear search
Close search
Google apps
Main menu