100+ datasets found
  1. Data from: Dataset of the manuscript "What is local research? Towards a...

    • zenodo.org
    • produccioncientifica.ugr.es
    bin
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia (2024). Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods" [Dataset]. http://doi.org/10.5281/zenodo.14190851
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.

  2. Data_Sheet_1_Utilizing Text Mining, Data Linkage and Deep Learning in Police...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Karystianis; Rina Carines Cabral; Soyeon Caren Han; Josiah Poon; Tony Butler (2023). Data_Sheet_1_Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence.docx [Dataset]. http://doi.org/10.3389/fdgth.2021.602683.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    George Karystianis; Rina Carines Cabral; Soyeon Caren Han; Josiah Poon; Tony Butler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types (“hands-on,” “hands-off,” “Apprehended Domestic Violence Order (ADVO) breach”). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for “ADVO breach” in a multilabel classification setup while the binary classification setup generated similar results. “Hands-off” offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. “Hands-on” offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

  3. Logistics analytics task

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iron Monger t (2023). Logistics analytics task [Dataset]. https://www.kaggle.com/datasets/ironmongert/logistics-analytics-task
    Explore at:
    zip(293839 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Iron Monger t
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Track dissent records for a logistics company

    The entities involved in this process include EvidenceLog, DissentRecord, Vendor, DissentCategory, and StatusMaster.

    The EvidenceLog entity represents the evidence that is logged for each dissent record and has attributes such as ID, EvidenceCode, EvidenceRelatedTo, DissentCategoryID, StatusID, VendorID, and more. The DissentRecord entity represents the actual dissent record and has attributes such as ID, DissentCategoryID, VendorID, and EvidenceLogID. The Vendor entity contains vendor details, while the DissentCategory entity contains attributes related to the category of dissent, such as ID, CategoryCode, CategoryName, and more. The StatusMaster entity contains attributes related to the status of the dissent record, such as ID, Status, description, and StatusCode.

    The relationships between these entities are also defined in the ER diagram, such as the many-to-one relationships between DissentRecords and DissentCategory, EvidenceLog, VendorMaster, and StatusMaster. Additionally, EvidenceLog has a one-to-many relationship with EvidenceImages.

    To develop and visualize sample data from this application, you could create sample records for each entity and populate them with data that represents the typical use case of the software. For example, you could create a DissentCategory record with the CategoryName "Damaged Goods", a StatusMaster record with the Status "Resolved", a Vendor record with details of a specific vendor, and an EvidenceLog record with details of the evidence related to the dissent record. You could then link these records together using the appropriate relationships, such as linking the DissentRecord to the DissentCategory, Vendor, and EvidenceLog records.

    To visualize this data, you could create a graphical representation of the ER diagram using a tool such as Lucidchart or draw.io. This would allow you to see the relationships between the entities and how they are linked together. Additionally, you could use a database management tool such as MySQL Workbench to create a database schema based on the ER diagram and populate it with sample data. This would allow you to view the data in a tabular format and run queries to retrieve specific information as needed.

  4. u

    Synthetic Administrative Data: Census 1991, 2023

    • datacatalogue.ukdataservice.ac.uk
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shlomo, N, University of Manchester; Kim, M, University of Manchester (2024). Synthetic Administrative Data: Census 1991, 2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-856310
    Explore at:
    Dataset updated
    Feb 21, 2024
    Authors
    Shlomo, N, University of Manchester; Kim, M, University of Manchester
    Area covered
    United Kingdom
    Description

    We create a synthetic administrative dataset to be used in the development of the R package for calculating quality indicators for administrative data (see: https://github.com/sook-tusk/qualadmin) that mimic the properties of a real administrative dataset according to specifications by the ONS. Taking over 1 million records from a synthetic 1991 UK census dataset, we deleted records, moved records to a different geography and duplicated records to a different geography according to pre-specified proportions for each broad ethnic group (White, Non-white) and gender (males, females). The final size of the synthetic administrative data was 1033664 individuals.

    National Statistical Institutes (NSIs) are directing resources into advancing the use of administrative data in official statistics systems. This is a top priority for the UK Office for National Statistics (ONS) as they are undergoing transformations in their statistical systems to make more use of administrative data for future censuses and population statistics. Administrative data are defined as secondary data sources since they are produced by other agencies as a result of an event or a transaction relating to administrative procedures of organisations, public administrations and government agencies. Nevertheless, they have the potential to become important data sources for the production of official statistics by significantly reducing the cost and burden of response and improving the efficiency of such systems. Embedding administrative data in statistical systems is not without costs and it is vital to understand where potential errors may arise. The Total Administrative Data Error Framework sets out all possible sources of error when using administrative data as statistical data, depending on whether it is a single data source or integrated with other data sources such as survey data. For a single administrative data, one of the main sources of error is coverage and representation to the target population of interest. This is particularly relevant when administrative data is delivered over time, such as tax data for maintaining the Business Register. For sub-project 1 of this research project, we develop quality indicators that allow the statistical agency to assess if the administrative data is representative to the target population and which sub-groups may be missing or over-covered. This is essential for producing unbiased estimates from administrative data. Another priority at statistical agencies is to produce a statistical register for population characteristic estimates, such as employment statistics, from multiple sources of administrative and survey data. Using administrative data to build a spine, survey data can be integrated using record linkage and statistical matching approaches on a set of common matching variables. This will be the topic for sub-project 2, which will be split into several topics of research. The first topic is whether adding statistical predictions and correlation structures improves the linkage and data integration. The second topic is to research a mass imputation framework for imputing missing target variables in the statistical register where the missing data may be due to multiple underlying mechanisms. Therefore, the third topic will aim to improve the mass imputation framework to mitigate against possible measurement errors, for example by adding benchmarks and other constraints into the approaches. On completion of a statistical register, estimates for key target variables at local areas can easily be aggregated. However, it is essential to also measure the precision of these estimates through mean square errors and this will be the fourth topic of the sub-project. Finally, this new way of producing official statistics is compared to the more common method of incorporating administrative data through survey weights and model-based estimation approaches. In other words, we evaluate whether it is better 'to weight' or 'to impute' for population characteristic estimates - a key question under investigation by survey statisticians in the last decade.

  5. TwinsUK

    • healthdatagateway.org
    unknown
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TwinsUK is funded by the Wellcome Trust, Medical Research Council, Versus Arthritis, European Union Horizon 2020, Chronic Disease Research Foundation (CDRF), Zoe Global Ltd and the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. (2024). TwinsUK [Dataset]. https://healthdatagateway.org/dataset/728
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    Medical Research Councilhttp://mrc.ukri.org/
    TwinsUKhttp://www.twinsuk.ac.uk/
    Authors
    TwinsUK is funded by the Wellcome Trust, Medical Research Council, Versus Arthritis, European Union Horizon 2020, Chronic Disease Research Foundation (CDRF), Zoe Global Ltd and the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London.
    License

    https://twinsuk.ac.uk/resources-for-researchers/access-our-data/https://twinsuk.ac.uk/resources-for-researchers/access-our-data/

    Description

    The TwinsUK cohort (https://twinsuk.ac.uk/), set up in 1992, is a major volunteer-based genomic epidemiology resource with longitudinal deep genomic and phenomics data from over 15,000 adult twins (18+) from across the UK who are highly engaged and recallable. The cohort is predominantly female (80%) for historical reasons. It is one of the most deeply characterised adult twin cohort in the world, providing a rich platform for scientists to research health and ageing longitudinally. There are over 700,000 biological samples stored and data collected on twins with repeat measures at multiple timepoints. Extremely large datasets (billions of data points) have been generated for each TwinsUK participant over 30 years, including phenotypes from questionnaires, multiple clinical visits, and record linkage, and genetic and ‘omic data from biological samples. TwinsUK ensures derived datasets from raw data are returned by collaborators to enhance the resource. TwinsUK also holds a wide range of laboratory samples, including plasma, serum, DNA, faecal microbiome and tissue (skin, fat, colonic biopsies) within HTA-regulated facilities at King's College London.

    More recently, postal and at-home collection strategies have allowed sample collections from frail twins, our whole cohort for COVID-19 studies, and for new twin recruits. The cohort is recallable either on a four-year longitudinal sweep visit or, based on diagnosis or genotype.

    More than 1,000 data access collaborations and 250,000 samples have been shared with external researchers, resulting in over 800 publications since 2012.

    TwinsUK is now working to link to twins’ official health, education and environmental records for health research purposes, which will further enhance the resource, education and environmental records for health research purposes, which will further enhance the resource.

  6. Z

    Data from: Citation network data sets for 'Oxytocin – a social peptide?...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leng, Rhodri Ivor (2022). Citation network data sets for 'Oxytocin – a social peptide? Deconstructing the evidence' [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5578956
    Explore at:
    Dataset updated
    Jun 5, 2022
    Dataset provided by
    University of Edinburgh
    Authors
    Leng, Rhodri Ivor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This note describes the data sets used for all analyses contained in the manuscript 'Oxytocin - a social peptide?’[1] that is currently under review.

    Data Collection

    The data sets described here were originally retrieved from Web of Science (WoS) Core Collection via the University of Edinburgh’s library subscription [2]. The aim of the original study for which these data were gathered was to survey peer-reviewed primary studies on oxytocin and social behaviour. To capture relevant papers, we used the following query:

    TI = (“oxytocin” OR “pitocin” OR “syntocinon”) AND TS = (“social*” OR “pro$social” OR “anti$social”)

    The final search was performed on the 13 September 2021. This returned a total of 2,747 records, of which 2,049 were classified by WoS as ‘articles’. Given our interest in primary studies only – articles reporting original data – we excluded all other document types. We further excluded all articles sub-classified as ‘book chapters’ or as ‘proceeding papers’ in order to limit our analysis to primary studies published in peer-reviewed academic journals. This reduced the set to 1,977 articles. All of these were published in the English language, and no further language refinements were unnecessary.

    All available metadata on these 1,977 articles was exported as plain text ‘flat’ format files in four batches, which we later merged together via Notepad++. Upon manually examination, we discovered examples of papers classified as ‘articles’ by WoS that were, in fact, reviews. To further filter our results, we searched all available PMIDs in PubMed (1,903 had associated PMIDs - ~96% of set). We then filtered results to identify all records classified as ‘review’, ‘systematic review’, or ‘meta-analysis’, identifying 75 records 3. After examining a sample and agreeing with the PubMed classification, these were removed these from our dataset - leaving a total of 1,902 articles.

    From these data, we constructed two datasets via parsing out relevant reference data via the Sci2 Tool [4]. First, we constructed a ‘node-attribute-list’ by first linking unique reference strings (‘Cite Me As’ column in WoS data files) to unique identifiers, we then parsed into this dataset information on the identify of a paper, including the title of the article, all authors, journal publication, year of publication, total citations as recorded from WoS, and WoS accession number. Second, we constructed an ‘edge-list’ that records the citations from a citing paper in the ‘Source’ column and identifies the cited paper in the ‘Target’ column, using the unique identifies as described previously to link these data to the node-attribute-list.

    We then constructed a network in which papers are nodes, and citation links between nodes are directed edges between nodes. We used Gephi Version 0.9.2 [5] to manually clean these data by merging duplicate references that are caused by different reference formats or by referencing errors. To do this, we needed to retain both all retrieved records (1,902) as well as including all of their references to papers whether these were included in our original search or not. In total, this produced a network of 46,633 nodes (unique reference strings) and 112,520 edges (citation links). Thus, the average reference list size of these articles is ~59 references. The mean indegree (within network citations) is 2.4 (median is 1) for the entire network reflecting a great diversity in referencing choices among our 1,902 articles.

    After merging duplicates, we then restricted the network to include only articles fully retrieved (1,902), and retrained only those that were connected together by citations links in a large interconnected network (i.e. the largest component). In total, 1,892 (99.5%) of our initial set were connected together via citation links, meaning a total of ten papers were removed from the following analysis – and these were neither connected to the largest component, nor did they form connections with one another (i.e. these were ‘isolates’).

    This left us with a network of 1,892 nodes connected together by 26,019 edges. It is this network that is described by the ‘node-attribute-list’ and ‘edge-list’ provided here. This network has a mean in-degree of 13.76 (median in-degree of 4). By restricting our analysis in this way, we lose 44,741 unique references (96%) and 86,501 citations (77%) from the full network, but retain a set of articles tightly knitted together, all of which have been fully retrieved due to possessing certain terms related to oxytocin AND social behaviour in their title, abstract, or associated keywords.

    Before moving on, we calculated indegree for all nodes in this network – this counts the number of citations to a given paper from other papers within this network – and have included this in the node-attribute-list. We further clustered this network via modularity maximisation via the Leiden algorithm [6]. We set the algorithm to resolution 1, and allowed the algorithm to run over 100 iterations and 100 restarts. This gave Q=0.43 and identified seven clusters, which we describe in detail within the body of the paper. We have included cluster membership as an attribute in the node-attribute-list.

    Data description

    We include here two datasets: (i) ‘OTSOC-node-attribute-list.csv’ consists of the attributes of 1,892 primary articles retrieved from WoS that include terms indicating a focus on oxytocin and social behaviour; (ii) ‘OTSOC-edge-list.csv’ records the citations between these papers. Together, these can be imported into a range of different software for network analysis; however, we have formatted these for ease of upload into Gephi 0.9.2. Below, we detail their contents:

    1. ‘OTSOC-node-attribute-list.csv’ is a comma-separate values file that contains all node attributes for the citation network (n=1,892) analysed in the paper. The columns refer to:

    Id, the unique identifier

    Label, the reference string of the paper to which the attributes in this row correspond. This is taken from the ‘Cite Me As’ column from the original WoS download. The reference string is in the following format: last name of first author, publication year, journal, volume, start page, and DOI (if available).

    Wos_id, unique Web of Science (WoS) accession number. These can be used to query WoS to find further data on all papers via the ‘UT= ’ field tag.

    Title, paper title.

    Authors, all named authors.

    Journal, journal of publication.

    Pub_year, year of publication.

    Wos_citations, total number of citations recorded by WoS Core Collection to a given paper as of 13 September 2021

    Indegree, the number of within network citations to a given paper, calculated for the network shown in Figure 1 of the manuscript.

    Cluster, provides the cluster membership number as discussed within the manuscript (Figure 1). This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.43|7 clusters)

    1. ‘OTSOC-edge -list.csv’ is a comma-separate values file that contains all citation links between the 1,892 articles (n=26,019). The columns refer to:

    Source, the unique identifier of the citing paper.

    Target, the unique identifier of the cited paper.

    Type, edges are ‘Directed’, and this column tells Gephi to regard all edges as such.

    Syr_date, this contains the date of publication of the citing paper.

    Tyr_date, this contains the date of publication of the cited paper.

    Software recommended for analysis

    Gephi version 0.9.2 was used for the visualisations within the manuscript, and both files can be read and into Gephi without modification.

    Notes

    [1] Leng, G., Leng, R. I., Ludwig, M. (Submitted). Oxytocin – a social peptide? Deconstructing the evidence.

    [2] Edinburgh University’s subscription to Web of Science covers the following databases: (i) Science Citation Index Expanded, 1900-present; (ii) Social Sciences Citation Index, 1900-present; (iii) Arts & Humanities Citation Index, 1975-present; (iv) Conference Proceedings Citation Index- Science, 1990-present; (v) Conference Proceedings Citation Index- Social Science & Humanities, 1990-present; (vi) Book Citation Index– Science, 2005-present; (vii) Book Citation Index– Social Sciences & Humanities, 2005-present; (viii) Emerging Sources Citation Index, 2015-present.

    [3] For those interested, the following PMIDs were identified as ‘articles’ by WoS, but as ‘reviews’ by PubMed: ‘34502097’ ‘33400920’ ‘32060678’ ‘31925983’ ‘31734142’ ‘30496762’ ‘30253045’ ‘29660735’ ‘29518698’ ‘29065361’ ‘29048602’ ‘28867943’ ‘28586471’ ‘28301323’ ‘27974283’ ‘27626613’ ‘27603523’ ‘27603327’ ‘27513442’ ‘27273834’ ‘27071789’ ‘26940141’ ‘26932552’ ‘26895254’ ‘26869847’ ‘26788924’ ‘26581735’ ‘26548910’ ‘26317636’ ‘26121678’ ‘26094200’ ‘25997760’ ‘25631363’ ‘25526824’ ‘25446893’ ‘25153535’ ‘25092245’ ‘25086828’ ‘24946432’ ‘24637261’ ‘24588761’ ‘24508579’ ‘24486356’ ‘24462936’ ‘24239932’ ‘24239931’ ‘24231551’ ‘24216134’ ‘23955310’ ‘23856187’ ‘23686025’ ‘23589638’ ‘23575742’ ‘23469841’ ‘23055480’ ‘22981649’ ‘22406388’ ‘22373652’ ‘22141469’ ‘21960250’ ‘21881219’ ‘21802859’ ‘21714746’ ‘21618004’ ‘21150165’ ‘20435805’ ‘20173685’ ‘19840865’ ‘19546570’ ‘19309413’ ‘15288368’ ‘12359512’ ‘9401603’ ‘9213136’ ‘7630585’

    [4] Sci2 Team. (2009). Science of Science (Sci2) Tool. Indiana University and SciTech Strategies. Stable URL: https://sci2.cns.iu.edu

    [5] Bastian, M., Heymann, S., & Jacomy, M. (2009).

  7. o

    Career promotions, research publications, Open Access dataset

    • ordo.open.ac.uk
    zip
    Updated Feb 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Cancellieri; Nancy Pontika; David Pride; Petr Knoth; Hannah Metzler; Antonia Correia; Helene Brinken; Bikash Gyawali (2022). Career promotions, research publications, Open Access dataset [Dataset]. http://doi.org/10.21954/ou.rd.19228785.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2022
    Dataset provided by
    The Open University
    Authors
    Matteo Cancellieri; Nancy Pontika; David Pride; Petr Knoth; Hannah Metzler; Antonia Correia; Helene Brinken; Bikash Gyawali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.

  8. NOAA Fundamental Climate Data Records (FCDR)

    • registry.opendata.aws
    Updated Jul 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA (2021). NOAA Fundamental Climate Data Records (FCDR) [Dataset]. https://registry.opendata.aws/noaa-cdr-fundamental/
    Explore at:
    Dataset updated
    Jul 13, 2021
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    NOAA's Climate Data Records (CDRs) are robust, sustainable, and scientifically sound climate records that provide trustworthy information on how, where, and to what extent the land, oceans, atmosphere and ice sheets are changing. These datasets are thoroughly vetted time series measurements with the longevity, consistency, and continuity to assess and measure climate variability and change. NOAA CDRs are vetted using standards established by the National Research Council (NRC).

    Climate Data Records are created by merging data from surface, atmosphere, and space-based systems across decades. NOAA’s Climate Data Records provides authoritative and traceable long-term climate records. NOAA developed CDRs by applying modern data analysis methods to historical global satellite data. This process can clarify the underlying climate trends within the data and allows researchers and other users to identify economic and scientific value in these records. NCEI maintains and extends CDRs by applying the same methods to present-day and future satellite measurements.

    Fundamental CDRs are composed of sensor data (e.g. calibrated radiances, brightness temperatures) that have been improved and quality controlled over time, together with ancillary calibration data.

  9. Dataset : Business Intelligence Research Trends

    • kaggle.com
    zip
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MUHAMMAD AKMAL HAKIM (2024). Dataset : Business Intelligence Research Trends [Dataset]. https://www.kaggle.com/datasets/akma1xz/dataset-business-intelligence-research-trends
    Explore at:
    zip(244328 bytes)Available download formats
    Dataset updated
    Nov 14, 2024
    Authors
    MUHAMMAD AKMAL HAKIM
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Overview

    This dataset presents a meticulously compiled collection of 387 academic publications that explore various aspects of social media and business intelligence. The dataset includes detailed metadata about each publication, such as titles, authorship, abstracts, publication years, article types, and the journals or conferences where they were published. Citations and research areas are also included, making this dataset a valuable resource for bibliometric analysis, trend detection, and literature reviews in the fields of social media analytics, sentiment analysis, business intelligence, and related disciplines.

    Content

    The dataset comprises 15 columns, each capturing specific attributes of the research papers. Below is a description of each column:

    • ID: A unique identifier for each record in the dataset.
    • Title: The title of the academic paper.
    • DOI: The Digital Object Identifier, which provides a permanent link to the publication.
    • Author: List of authors who contributed to the paper.
    • Abstract: A summary of the research paper, providing insights into the study's objectives, methods, and findings.
    • Year: The year the paper was published.
    • Article Type: Indicates the type of publication (e.g., Proceedings Paper, Article, Book Chapter).
    • Publication Name: The name of the journal or conference where the paper was published.
    • Number-of-Cited-References: The number of references cited in the paper.
    • Times Cited: The number of times the paper has been cited by other works.
    • Research Areas: The general research area(s) the paper pertains to (e.g., Computer Science, Engineering).
    • WOS Category: Specific categories or subfields relevant to Web of Science classification.
    • WOS Index: The index within Web of Science where the paper is listed.
    • Keywords: Keywords provided by the authors to describe the main topics of the paper.
    • Keyword Plus: Additional keywords derived from the titles of the paper’s cited references.

    Applications

    This dataset can be utilized for a variety of purposes, including but not limited to:

    • Trend Analysis: Identify emerging trends and popular topics in social media and business intelligence research.
    • Citation Analysis: Analyze citation patterns to determine the impact and relevance of specific publications.
    • Collaborative Networks: Map out authorship and institutional collaboration trends.
    • Text Mining: Perform text mining on abstracts and keywords to uncover latent themes and topics.
    • Research Evaluation: Conduct bibliometric evaluations to assess the productivity and impact of researchers and institutions in the field.

    Data Collection and Preprocessing

    The dataset was curated by extracting bibliometric data from Web of Science (WOS), ensuring the inclusion of comprehensive and high-quality metadata. All records have been standardized for consistency and completeness to facilitate easier analysis.

  10. CD4 count recovery and associated factors among individuals enrolled in the...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tendesayi Kufa; Zara Shubber; William MacLeod; Simbarashe Takuva; Sergio Carmona; Jacob Bor; Marelize Gorgens; Yogan Pillay; Adrian Puren; Jeffrey W. Eaton; Nicole Fraser-Hurt (2023). CD4 count recovery and associated factors among individuals enrolled in the South African antiretroviral therapy programme: An analysis of national laboratory based data [Dataset]. http://doi.org/10.1371/journal.pone.0217742
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tendesayi Kufa; Zara Shubber; William MacLeod; Simbarashe Takuva; Sergio Carmona; Jacob Bor; Marelize Gorgens; Yogan Pillay; Adrian Puren; Jeffrey W. Eaton; Nicole Fraser-Hurt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Africa
    Description

    BackgroundWe describe CD4 count recovery among HIV positive individuals who initiated antiretroviral therapy (ART) with and without severe immune suppression using complete laboratory data from South Africa’s national HIV treatment programme between 2010 and 2014 and discuss implications for CD4 count monitoring.MethodsRetrospective analysis of routinely collected laboratory data from South Africa’s National Health Laboratory Service (NHLS). A probabilistic record linkage algorithm was used to create a cohort of HIV positive individuals who initiated ART between 2010 and 2014 based on timing of CD4 count and viral load measurements. A CD4 count < 50 copies/μl at ART initiation was considered severe immunosuppression. A multivariable piecewise mixed-effects linear regression model adjusting for age, gender, year of starting ART, viral suppression in follow up and province was used to predict CD4 counts during follow up.Results1,070,900 individuals had evidence of starting ART during 2010–2014 and met the criteria for inclusion in the cohort -46.6% starting ART with CD4 < 200 cells/μl and 10.1% with CD4 < 50 cells/ μl. For individuals with CD4 counts < 200 cells/μl, predicted CD4 counts > 200 cells/μl, >350 cells/μl and >500 cells/μl corresponded with mean follow up durations of 1.5 years (standard deviation [s.d] 1.1), 1.9years (s.d 1.2) and 2.1 years (s.d 1.3 years). For those with CD4 counts < 50 cells/μl, predicted CD4 count above these threshold corresponded with mean follow up durations of 2.5 years (s.d 0.9 years), 4.4 years (s.d 0.4 years) and 5.0 years (s.d 0.1years) for recovery to the same thresholds. CD4 count recovery varied mostly with duration on ART, CD4 count at the start of ART and gender.ConclusionFor individuals starting with ART with severe immunosuppression, CD4 recovery to 200cells/μl did not occur or took longer than 12 month for significant proportions. CD4 monitoring and interventions recommended for advanced HIV disease should continue until full recovery.

  11. h

    HZDR Data Management Strategy — Top-Level Architecture

    • rodare.hzdr.de
    pdf
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knodel, Oliver; Gruber, Thomas; Kelling, Jeffrey; Lokamani, Mani; Müller, Stefan; Pape, David; Juckeland, Guido (2023). HZDR Data Management Strategy — Top-Level Architecture [Dataset]. http://doi.org/10.14278/rodare.2513
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    Helmholtz-Zentrum Dresden - Rossendorf
    Authors
    Knodel, Oliver; Gruber, Thomas; Kelling, Jeffrey; Lokamani, Mani; Müller, Stefan; Pape, David; Juckeland, Guido
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This data publication contains an overview to the Top-Level Architecture of the proposed HZDR Data Management Strategy with additional description of the various systems and services.

    The Helmholtz-Zentrum Dresden-Rossendorf (HZDR) pursues a comprehensive data management strategy that is designed as an architecture of services to describe and manage scientific experiments in a sustainable manner. This strategy is based on the FAIR principles and aims to ensure the findability, accessibility, interoperability and reusability of research data.
    The HZDR's comprehensive data lifecycle covers all phases of the data lifecycle: from planning and collection to analysis, storage, publication and archiving. Each phase is supported by specialised services and tools that help scientists to efficiently collect, store and share their data. These services include:

    • Electronic lab notebook: for the digital recording and management of lab experiments and data.
    • Data management plans (RDMO): For planning and organising data management during a research project.
    • (Time Series) Databases: For structured storage and retrieval of research data.
    • File systems: For storing and managing files in a controlled environment.
    • Publication systems (ROBIS, RODARE): For the publication and accessibility of research data and results.
    • Metadata catalogue (SciCat): For describing data in a wide variety of subsystems using searchable metadata
    • Repositories (Helmholtz Codebase): For archiving, version control and provision of software, special data sets and workflows.
    • Proposal Management System (GATE): For the administration of project proposals and approvals.

    The superordinate web service HELIPORT plays a central role here. HELIPORT acts as a gateway and connecting service that links all components of the Data Management Strategy and describes them in a sustainable manner. HELIPORT ensures standardised access to the various services and tools, which considerably simplifies collaboration and the exchange of data.

  12. d

    Movement cost surface - A landscape connectivity analysis for the coastal...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Movement cost surface - A landscape connectivity analysis for the coastal marten (Martes caurina humboldtensis) [Dataset]. https://catalog.data.gov/dataset/movement-cost-surface-a-landscape-connectivity-analysis-for-the-coastal-marten-martes-caur
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset provided by
    U.S. Fish and Wildlife Service
    Description

    This Movement Cost Surface (raster) is an intermediary modeling product that was produced by the Linkage Mapper tool (McRae and Kavanagh 2011) in the process of developing Least-Cost Paths (LCPs) and Least-Cost Corridors for use in our coastal marten connectivity model. It is derived from two other datasets (ResistanceSurface and PrimaryModel_HabitatCores) and was produced using the Linkage Mapper parameters defined in the Lineage section of the geospatial metadata record. More specifically, this is an intermediary product in which the resistance surface and habitat cores have been converted into a movement cost surface to show the accumulative movement cost (in cost-weighted meters) of moving away from each core. The easiest way to understand where this intermediary dataset fits into the broader Linkage Mapper modeling process is to review the Linkage Mapper User Guide, available on circuitscape.org (McRae and Kavanagh 2016). Refer to the PrimaryModel_LeastCostCorridors, ResistanceSurface, and PrimaryModel_HabitatCores metadata records for additional context.

  13. t

    Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

    • researchdata.tuwien.at
    html, pdf, zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
    Explore at:
    html, zip, pdfAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    TU Wien
    Authors
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    How To Cite?

    Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599

    Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599

    Folder Structure

    The folder named “submission” contains the following:

    1. “pythonProject”: This folder contains all the Python files and subfolders needed for analysis.
    2. ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

    Setting Up the Environment

    1. Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.
    2. The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

    Subfolders

    1. Data_4_IJGIS

    • This folder contains the data used for the results reported in the paper.
    • Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

    2. results_[DateTime] (e.g., results_20240906_15_00_13)

    • This folder will be generated when you run the code and will store the output of each step.
    • The current folder contains results created during code debugging for the submission.
    • When you run the code, a new folder with fresh results will be generated.

    Python Files

    1. helper_functions.py

    • Contains reusable functions used throughout the analysis.
    • Each function includes a description of its purpose and the input parameters required.

    2. create_sanity_plots.py

    • Generates scatter plots like those in Figure 3 of the paper.
    • Although the code has been run for all 309 trials, it can be used to check the sample data provided.
    • Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.
    • Usage: Run this file to create visualizations similar to Figure 3.

    3. overlapping_sliding_window_loop.py

    • Implements overlapping sliding window segmentation and generates plots like those in Figure 4.
    • Output:
      • Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.
      • Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.
      • A visualization of the segments, similar to Figure 4, will be automatically generated.

    4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

    • These files compute features as explained in Tables 1 and 2 of the paper, respectively.
    • They process the segmented recordings generated by the overlapping_sliding_window_loop.py.
    • Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

    5. training_prediction.py

    • This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:
    a. Data Preparation (corresponding to Section 5.1.1 of the paper)
    • Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.
    • A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.
    b. Training/Validation/Test Split
    • Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).
    • Make sure that you follow the instructions in the comments to the code exactly.
    • Output: The split data is saved as .csv files in the results folder.
    c. Machine and Deep Learning Experiments

    This part contains three main code blocks:

    iii. One for the XGboost code with correct hyperparameter tuning:
    Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

    • MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.
    • XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.
    • XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

    Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

    d. Inference (Monitoring Part)
    • Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.
    • Figure 8 in the paper is generated using this part of the code.

    6. sequence_analysis.py

    • Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.
    • This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

    Licenses

    The data is licensed under CC-BY, the code is licensed under MIT.

  14. Customer Shopping Trends Dataset

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
    Explore at:
    zip(149846 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Sourav Banerjee
    Description

    Context

    The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

    Content

    This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

    Dataset Glossary (Column-wise)

    • Customer ID - Unique identifier for each customer
    • Age - Age of the customer
    • Gender - Gender of the customer (Male/Female)
    • Item Purchased - The item purchased by the customer
    • Category - Category of the item purchased
    • Purchase Amount (USD) - The amount of the purchase in USD
    • Location - Location where the purchase was made
    • Size - Size of the purchased item
    • Color - Color of the purchased item
    • Season - Season during which the purchase was made
    • Review Rating - Rating given by the customer for the purchased item
    • Subscription Status - Indicates if the customer has a subscription (Yes/No)
    • Shipping Type - Type of shipping chosen by the customer
    • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
    • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
    • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
    • Payment Method - Customer's most preferred payment method
    • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

    Structure of the Dataset

    https://i.imgur.com/6UEqejq.png" alt="">

    Acknowledgement

    This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

    Cover Photo by: Freepik

    Thumbnail by: Clothing icons created by Flat Icons - Flaticon

  15. m

    DBLP Records and Entries for Key Computer Science Conferences

    • data.mendeley.com
    Updated Mar 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Agarwal (2016). DBLP Records and Entries for Key Computer Science Conferences [Dataset]. http://doi.org/10.17632/3p9w84t5mr.1
    Explore at:
    Dataset updated
    Mar 27, 2016
    Authors
    Swati Agarwal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset ”DBLP-CSR.zip” is derived from September 17, 2015 snapshot of dblp bibliography database. It contains the last 16 years (2000 − 2015) of publications records of 81 Computer Science Research conferences used for a study conducted in our paper Women in Computer Science Research- What is Bibliography Data Telling Us? published in ACM SIGCAS Computers and Society Newsletter, Volume 46, Issue 1, February 2016. Link to the Newsletter Archive: http://dl.acm.org/citation.cfm?id=J198

    The dataset contains 7 .sql files and a README file providing the description of dataset and attributes. The seven .sql files are primarily named as affiliation_coord.sql, affiliation.sql, author_gender.sql, authors.sql, editor_gender.sql, editor.sql and main.sql.

    The affiliation_coord.sql, affiliation.sql, authors.sql, editor.sql files create the tables with same name. While main.sql, editor_gender.sql and author_gender.sql create tables with the names general, genedit and genauth old respectively.

    Followings are the list and description of all attributes used in the dataset. Same attributes used in different tables are listed only once.

    1. Table- general

    k- unique id of each article- primary key in the table. year- the year of publication conf- abbreviation for conference name (HT for ACM HyperText) crossref- cross reference link to all articles published in a conference in a year cs, de, se, th- a binary attribute denoting if a conference belongs to these domains (Computer Science, Data Engineering, Software Engineering, Theory) publisher- Name of the conference publisher link- unique DOI link to the article that re-directs to conference publisher page.

    1. Table- authors

    pos- position of author in the paper. 0 denotes first author name- unique name of author in dblp dataset gender- gender of authors. Hyphen (-) denotes that gender was not determined. Please refer to the paper for more details. prob- probability of a name to be M, F, -.

    1. Table- editors

    k- foreign key for crossref attribute in general table pos- position of editor in conference. 0 denotes the first editor.

    1. Table- genauth_old and genedit

    contain the records of gender information of authors and editors- derived from authors and editors tables.

    1. Table- affiliation

    affil- affiliation record of each author publishing in the 81 conferences mentioned above. year- year of publication

    1. Table- affiliation_coord

    country- country of the author extracted from affiliation country_code- code to be used for maps lat, lng- latitude and longitude information of affiliation.

  16. Data from: OpenAIRE Usage Counts. The analytics service of OpenAIRE Research...

    • data.europa.eu
    unknown
    Updated Nov 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2020). OpenAIRE Usage Counts. The analytics service of OpenAIRE Research Graph [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4268144?locale=hu
    Explore at:
    unknown(1992696)Available download formats
    Dataset updated
    Nov 8, 2020
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Usage metrics for all types of scholarly output are one of the measures to assess Open Access impact and are a value added service of Open Access repositories. OpenAIRE has a succesful record in providing usage metrics for for a large number of repositories from around the world. OpenAIRE's Usage Counts service collects usage activity from OpenAIRE content providers, like Insitutional Repositories or national aggreegators like IRUS-UK, or LaReferencia, for usage events related to research products of the OpenAIRE graph, like publications. It subsequently creates and deploys aggregated statistics for these products and delivers standardized activity reports following the COUNTER CoP, about research usage and uptake. It complements existing citation mechanisms and assists institutional repository managers, research communities, research organizations, funders and policy makers track and evaluate research from an early stage. Following its successful record in publications, OpenAIRE's Usage Counts service is ready to be applied to another product of the OpenAIRE reasearch graph, i.e., the research data. The service will monitor and analyze usage activity for research data repositories, as well as usage reports from aggregators like Datacite. This usage will not only be aggregated but also combined with usage activity from publications, by exploiting other OpenAIRE services like the OpenAIRE ScholeXplorer. In this manner OpenAIRE Usage Counts Service will operate as a hub of usage statistics, linking together all kinds of scholarly output, offering a value added service for Open Access and realize the Open Analytics Framework and Infrastructure required for scientific reward in European Open Science Cloud. From the technical perspective, usage data will be collected in two ways: (1) by collecting usage events directly from data repositories and (2) from research data statistics aggregators exposing consolidated statistics via SUSHI-Lite. The final outcome is an OpenAIRE service for tracking, collection, cleaning, analysis, evaluation and COUNTER-compliant reporting of research data combined with other products from OpenAIRE research graph. The poster will describe two aspects: 1) The potential of the OpenAIRE Usage Counts service to explore a number of multidimensional scholarly performance indicators. 2) Contributing as a Usage Counts Hub to services aggregating OpenAIRE Research Graph product-level metrics.

  17. CollegeScorecard US College Graduation and

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). CollegeScorecard US College Graduation and [Dataset]. https://www.kaggle.com/datasets/thedevastator/collegescorecard-us-college-graduation-and-oppor/discussion
    Explore at:
    zip(6248358 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    The Devastator
    Description

    CollegeScorecard US College Graduation and Opportunity Data

    Exploring Student Success and Outcomes

    By Noah Rippner [source]

    About this dataset

    This dataset provides an in-depth look at the data elements for the US College CollegeScorecard Graduation and Opportunity Project Use Case. It contains information on the variables used to create a comprehensive report, including Year, dev-category, developer-friendly name, VARIABLE NAME, API data type, label, VALUE, LABEL , SCORECARD? Y/N , SOURCE and NOTES. The data is provided by the U.S Department of Education and allows parents, students and policymakers to take meaningful action to improve outcomes. This dataset contains more than enough information to allow people like Maria - a 25 year old recent US Army veteran who wants a degree in Management Systems and Information Technology -to distinguish between her school options; access services; find affordable housing near high-quality schools which are located in safe neighborhoods that have access to transport links as well as employment opportunities nearby. This highly useful dataset provides detailed analysis of all this criteria so that users can make an informed decision about which school is best for them!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains data related to college students, including their college graduation rates, access to opportunity indicators such as geographic mobility and career readiness, and other important indicators of the overall learning experience in the United States. This guide will show you how to use this dataset to make meaningful conclusions about high education in America.

    First, you will need to be familiar with the different fields included in this CollegeScorecard’s US College Graduation and Opportunity Data set. Each record is comprised of several data elements which are defined by concise labels on the left side of each observation row. These include labels such as Name of Data Element, Year, dev-category (i.e., developmental category), Variable Name, API data type (i.e., type information for programmatic interface), Label (i.e., descriptive content labeling for visual reporting), Value , Label (i.e., descriptive value labeling for visual reporting). SCORECARD? Y/N indicates whether or not a field pertains to U.S Department of Education’s College Scorecard program and SOURCE indicates where the source of the variable can be found among other minor details about that variable are found within Notes column attributed beneath each row entry for further analysis or comparison between elements captured across observations

    Now that you understand the components associated within each element or label related within Observation Rows identified beside each header label let’s go over some key steps you can take when working with this particular dataset:

    • Utilize year specific filters on specified fields if needed — e.g.; Year = 2020 & API Data Type = Character
    • Look up any ‘NCalPlaceHolder” values if applicable — these are placeholders often stating values have been absolved fromScorecards display versioning due conflicting formatting requirements across standard conditions being met or may state these details have still yet been updated recently so upon assessment wait patiently until returns minor changes via API interface incorporate latest returned results statements inventory configuration options relevant against budgetary cycle limits established positions

    • Pivot data points into more custom tabular structured outputs tapering down complex unstructured RAW sources into more digestible Medium Level datasets consumed often via PowerBI / Tableau compatible Snapshots expanding upon Delimited text exports baseline formats provided formerly

    • Explore correlations between education metrics our third parties documents generated frequently such values indicative educational adherence effects ROI growth potential looking beyond Campus Panoramic recognition metrics often supported outside Social Medial Primary

    Research Ideas

    • Creating an interactive dashboard to compare school performance in terms of safety, entrepreneurship and other criteria.
    • Using the data to create a heat map visualization that shows which cities are most conducive to a successful educational experience for students like Maria.
    • Gathering information about average course costs at different universities and mapping them relative to US unemployment rates indicates which states might offer the best value for money when it comes to higher education expenses

    Ack...

  18. Forecast revenue big data market worldwide 2011-2027

    • statista.com
    Updated Mar 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2018). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
    Explore at:
    Dataset updated
    Mar 15, 2018
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027. What is Big data? Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. Big data analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.

  19. z

    Requirements data sets (user stories)

    • zenodo.org
    • data.mendeley.com
    txt
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabiano Dalpiaz; Fabiano Dalpiaz (2025). Requirements data sets (user stories) [Dataset]. http://doi.org/10.17632/7zbk8zsd8y.1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Mendeley Data
    Authors
    Fabiano Dalpiaz; Fabiano Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of 22 data set of 50+ requirements each, expressed as user stories.

    The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]

    The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light

    This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1

    Overview of the datasets [data and links added in December 2024]

    The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.

    Public administration and transparency

    g02-federalspending.txt (2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.

    g03-loudoun.txt (2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.

    g04-recycling.txt(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).

    g05-openspending.txt (2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.

    g11-nsf.txt (2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.

    (Research) data and meta-data management

    g08-frictionless.txt (2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.

    g14-datahub.txt (2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.

    g16-mis.txt (2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.

    g17-cask.txt (2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.

    g18-neurohub.txt (2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.

    g22-rdadmp.txt (2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.

    g23-archivesspace.txt (2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, and
    born digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its

  20. E

    [JeDI] - Jellyfish Database Initiative: Global records on gelatinous...

    • erddap.bco-dmo.org
    Updated Apr 3, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCO-DMO (2018). [JeDI] - Jellyfish Database Initiative: Global records on gelatinous zooplankton for the past 200 years, collected from global sources and literature (Trophic BATS project) (Plankton Community Composition and Trophic Interactions as Modifiers of Carbon Export in the Sargasso Sea ) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_526852/index.html
    Explore at:
    Dataset updated
    Apr 3, 2018
    Dataset provided by
    Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
    Authors
    BCO-DMO
    License

    https://www.bco-dmo.org/dataset/526852/licensehttps://www.bco-dmo.org/dataset/526852/license

    Area covered
    Sargasso Sea,
    Variables measured
    day, date, year, depth, month, taxon, contact, density, latitude, net_mesh, and 27 more
    Description

    The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. Other auxiliary metadata, such as physical, environmental and biometric information relating to the gelatinous zooplankton metadata, are included with each respective entry. JeDI has been developed and designed as an open access research tool for the scientific community to quantitatively define the global baseline of gelatinous zooplankton populations and to describe long-term and large-scale trends in gelatinous zooplankton populations and blooms. It has also been constructed as a future repository of datasets, thus allowing retrospective analyses of the baseline and trends in global gelatinous zooplankton populations to be conducted in the future. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=This information has been synthesized by members of the Global Jellyfish Group from online databases, unpublished and published datasets. More specific details may be found in\u00a0"%5C%22http://dmoserv3.bco-%0Admo.org/data_docs/JeDI/Lucas_et_al_2014_GEB.pdf%5C%22">Lucas, C.J., et al. 2014. Gelatinous zooplankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecol. Biogeogr. (DOI: 10.1111/geb.12169) in the\u00a0methods section. awards_0_award_nid=54810 awards_0_award_number=OCE-1030149 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1030149 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=JeDI: Jellyfish Database Initiative, associated with the Trophic BATS project PIs: R. Condon, C. Lucas, C. Duarte, K. Pitt version 2015.01.08 Note: The displayed view of this dataset is subject to updates Note: Duplicate records were removed on 2015.01.08 See: Dataset term legend for full text of abbreviations. Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/7191 Easternmost_Easting=180.0 geospatial_lat_max=88.74 geospatial_lat_min=-78.5 geospatial_lat_units=degrees_north geospatial_lon_max=180.0 geospatial_lon_min=-180.0 geospatial_lon_units=degrees_east geospatial_vertical_max=7632.0 geospatial_vertical_min=-10191.48 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/526852 institution=BCO-DMO metadata_source=https://www.bco-dmo.org/api/dataset/526852 Northernmost_Northing=88.74 param_mapping={'526852': {'lat': 'master - latitude', 'depth': 'master - depth', 'lon': 'master - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/526852/parameters people_0_affiliation=University of North Carolina - Wilmington people_0_affiliation_acronym=UNC-Wilmington people_0_person_name=Robert Condon people_0_person_nid=51335 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=University of Western Australia people_1_person_name=Carlos M. Duarte people_1_person_nid=526857 people_1_role=Co-Principal Investigator people_1_role_type=originator people_2_affiliation=National Oceanography Centre people_2_affiliation_acronym=NOC people_2_person_name=Cathy Lucas people_2_person_nid=526856 people_2_role=Co-Principal Investigator people_2_role_type=originator people_3_affiliation=Griffith University people_3_person_name=Kylie Pitt people_3_person_nid=526858 people_3_role=Co-Principal Investigator people_3_role_type=originator people_4_affiliation=Woods Hole Oceanographic Institution people_4_affiliation_acronym=WHOI BCO-DMO people_4_person_name=Danie Kinkade people_4_person_nid=51549 people_4_role=BCO-DMO Data Manager people_4_role_type=related project=Trophic BATS projects_0_acronym=Trophic BATS projects_0_description=Fluxes of particulate carbon from the surface ocean are greatly influenced by the size, taxonomic composition and trophic interactions of the resident planktonic community. Large and/or heavily-ballasted phytoplankton such as diatoms and coccolithophores are key contributors to carbon export due to their high sinking rates and direct routes of export through large zooplankton. The potential contributions of small, unballasted phytoplankton, through aggregation and/or trophic re-packaging, have been recognized more recently. This recognition comes as direct observations in the field show unexpected trends. In the Sargasso Sea, for example, shallow carbon export has increased in the last decade but the corresponding shift in phytoplankton community composition during this time has not been towards larger cells like diatoms. Instead, the abundance of the picoplanktonic cyanobacterium, Synechococccus, has increased significantly. The trophic pathways that link the increased abundance of Synechococcus to carbon export have not been characterized. These observations helped to frame the overarching research question, "How do plankton size, community composition and trophic interactions modify carbon export from the euphotic zone". Since small phytoplankton are responsible for the majority of primary production in oligotrophic subtropical gyres, the trophic interactions that include them must be characterized in order to achieve a mechanistic understanding of the function of the biological pump in the oligotrophic regions of the ocean. This requires a complete characterization of the major organisms and their rates of production and consumption. Accordingly, the research objectives are: 1) to characterize (qualitatively and quantitatively) trophic interactions between major plankton groups in the euphotic zone and rates of, and contributors to, carbon export and 2) to develop a constrained food web model, based on these data, that will allow us to better understand current and predict near-future patterns in export production in the Sargasso Sea. The investigators will use a combination of field-based process studies and food web modeling to quantify rates of carbon exchange between key components of the ecosystem at the Bermuda Atlantic Time-series Study (BATS) site. Measurements will include a novel DNA-based approach to characterizing and quantifying planktonic contributors to carbon export. The well-documented seasonal variability at BATS and the occurrence of mesoscale eddies will be used as a natural laboratory in which to study ecosystems of different structure. This study is unique in that it aims to characterize multiple food web interactions and carbon export simultaneously and over similar time and space scales. A key strength of the proposed research is also the tight connection and feedback between the data collection and modeling components. Characterizing the complex interactions between the biological community and export production is critical for predicting changes in phytoplankton species dominance, trophic relationships and export production that might occur under scenarios of climate-related changes in ocean circulation and mixing. The results from this research may also contribute to understanding of the biological mechanisms that drive current regional to basin scale variability in carbon export in oligotrophic gyres. projects_0_end_date=2014-09 projects_0_geolocation=Sargasso Sea, BATS site projects_0_name=Plankton Community Composition and Trophic Interactions as Modifiers of Carbon Export in the Sargasso Sea projects_0_project_nid=2150 projects_0_start_date=2010-10 sourceUrl=(local files) Southernmost_Northing=-78.5 standard_name_vocabulary=CF Standard Name Table v55 version=1 Westernmost_Easting=-180.0 xml_source=osprey2erddap.update_xml() v1.3

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia (2024). Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods" [Dataset]. http://doi.org/10.5281/zenodo.14190851
Organization logo

Data from: Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods"

Related Article
Explore at:
binAvailable download formats
Dataset updated
Nov 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.

Search
Clear search
Close search
Google apps
Main menu