17 datasets found
  1. Positive and Negative Word List.rar

    • kaggle.com
    zip
    Updated Sep 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mukul (2020). Positive and Negative Word List.rar [Dataset]. https://www.kaggle.com/mukulkirti/positive-and-negative-word-listrar
    Explore at:
    zip(84813 bytes)Available download formats
    Dataset updated
    Sep 12, 2020
    Authors
    mukul
    Description

    Context

    The idea behind creating this dataset is to Use of negative and Positive word Sense for the research purpose. it has been made for research related to linguistic, like NLP, AI, Behaviour Detection and many more . it helps to: 1. Research whether language utilized in science abstracts can skew towards the employment of strikingly positive and negative words over time.
    2. The yearly frequencies of positive, negative, and neutral words, plus 100 randomly selected words were normalised for the whole number of abstracts. 3. Subanalyses included pattern quantification of individual words, specificity for selected high impact journals, and comparison between author affiliations within or outside countries with English because the official majority language.

    in an analysis Frequency patterns were compared with 4% of all books ever printed and digitised by use of Google Books Ngram Viewer. Main outcome measures Frequencies of positive and negative words in abstracts compared with frequencies of words with a neutral and random connotation, expressed as relative change since 1980 so it can help in these tasks too. Results absolutely the frequency of positive words increased from 2.0% (1974-80) to 17.5% (2014), a relative increase of 880% over four decades. All 25 individual positive words contributed to the rise, particularly the words “robust,” “novel,” “innovative,” and “unprecedented,” which increased in ratio up to fifteen 000%. Comparable but less pronounced results were obtained when restricting the analysis to chose journals with high impact factors. Authors affiliated to an institute during a non-English speaking country used significantly more positive words. Negative word frequencies increased from 1.3% (1974-80) to three.2% (2014), a relative increase of 257%. Over the identical period of time, no apparent increase was found in neutral or random word use, or within the frequency of positive word use in published books. so lexicographic analysis indicates that scientific abstracts are currently written with more positive and negative words, and provides an insight into the evolution of scientific writing. Apparently scientists look on the brilliant side of research results. So THis data set can play major role in research.

    Content

    About The Data Set: 1. Dataset is in Excel File Format. 2. Dataset Has two Column (I) Negative Word List (II) Positive Word List 3. In the Dataset Total 4699, Positive Words and Total 4722 Negative Words are theirs. 4. Dataset is collected data from different sources. 5. The dataset has some Null (nan) Values. 6. Please check the Data Once before Use.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Just to see how it can help in many NLP related Tasks.

  2. Data articles in journals

    • zenodo.org
    bin, csv, txt
    Updated Sep 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.7458466
    Explore at:
    bin, txt, csvAvailable download formats
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Last Version: 4

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2022/12/15

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 4th version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

    Version: 3

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2022/10/28

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 3rd version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

    Erratum - Data articles in journals Version 3:

    Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
    Data -- ISSN 2306-5729 -- JCR (JIF) n/a
    Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

    Version: 2

    Author: Francisco Rubio, Universitat Politècnia de València.

    Date of data collection: 2020/06/23

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 2nd version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

    Total size: 32 KB

    Version 1: Description

    This dataset contains a list of journals that publish data articles, code, software articles and database articles.

    The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
    Acknowledgements:
    Xaquín Lores Torres for his invaluable help in preparing this dataset.

  3. m

    Global data set on micro- and mesoplastic loads in marine sediments

    • data.mendeley.com
    Updated Oct 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cecilia Martin (2021). Global data set on micro- and mesoplastic loads in marine sediments [Dataset]. http://doi.org/10.17632/6k38hr5zhw.1
    Explore at:
    Dataset updated
    Oct 18, 2021
    Authors
    Cecilia Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide two files, an excel file named: "Global data set on micro- and mesoplastic loads in marine sediments" and a PDF file named "Metadata-Dataset". The excel file provides the dataset and the list of references from which the data were extracted or derived. The PDF file provides a detailed description of the dataset and of the methods used to extract and derive data.

  4. Privacy Shield Lists of U.S. Companies

    • catalog.data.gov
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    International Trade Administration (2025). Privacy Shield Lists of U.S. Companies [Dataset]. https://catalog.data.gov/dataset/privacy-shield-lists-of-u-s-companies-822c6
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    International Trade Administrationhttp://trade.gov/
    Area covered
    United States
    Description

    The EU-U.S. and Swiss-U.S. Privacy Shield Frameworks are mechanisms that companies can use to comply with data protection requirements when transferring personal data from the European Union and Switzerland to the United States. ITA\'s Privacy Shield Team maintains two lists that are made available to the public: 1) the Privacy Shield Active List, and 2) the Privacy Shield Inactive List. The Active List is an authoritative list of U.S. organizations that have self-certified to the Department of Commerce and declared their commitment to adhere to the Privacy Shield Principles. The Inactive List is an authoritative list of U.S. organizations that are no longer self-certified under Privacy Shield and are therefore no longer assured of the benefits of using Privacy Shield to receive personal data from the European Union and/or Switzerland. Upon request, the Privacy Shield Team may provide a copy of the list in the form of an Excel spreadsheet.

  5. d

    List of all countries with their 2 digit codes (ISO 3166-1)

    • datahub.io
    Updated Aug 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). List of all countries with their 2 digit codes (ISO 3166-1) [Dataset]. https://datahub.io/core/country-list
    Explore at:
    Dataset updated
    Aug 29, 2017
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    ISO 3166-1-alpha-2 English country names and code elements. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.

  6. s

    Analysis of CBCS publications for Open Access, data availability statements...

    • figshare.scilifelab.se
    • researchdata.se
    • +2more
    txt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Kieselbach (2025). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/scilifelab.23641749.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Umeå University
    Authors
    Theresa Kieselbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)

  7. 🦈 Shark Tank India dataset 🇮🇳

    • kaggle.com
    zip
    Updated Oct 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
    Explore at:
    zip(45970 bytes)Available download formats
    Dataset updated
    Oct 5, 2025
    Authors
    Satya Thirumani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Shark Tank India Data set.

    Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

    All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

    Here is the data dictionary for (Indian) Shark Tank season's dataset.

    • Season Number - Season number
    • Startup Name - Company name or product name
    • Episode Number - Episode number within the season
    • Pitch Number - Overall pitch number
    • Season Start - Season first aired date
    • Season End - Season last aired date
    • Original Air Date - Episode original/first aired date, on OTT/TV
    • Episode Title - Episode title in SonyLiv
    • Anchor - Name of the episode presenter/host
    • Industry - Industry name or type
    • Business Description - Business Description
    • Company Website - Company Website URL
    • Started in - Year in which startup was started/incorporated
    • Number of Presenters - Number of presenters
    • Male Presenters - Number of male presenters
    • Female Presenters - Number of female presenters
    • Transgender Presenters - Number of transgender/LGBTQ presenters
    • Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no
    • Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old
    • Pitchers City - Presenter's town/city or place where company head office exists
    • Pitchers State - Indian state pitcher hails from or state where company head office exists
    • Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue
    • Monthly Sales - Total monthly sales, in lakhs
    • Gross Margin - Gross margin/profit of company, in percentages
    • Net Margin - Net margin/profit of company, in percentages
    • EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization
    • Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)
    • SKUs - Stock Keeping Units or number of varieties, at the time of pitch
    • Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch
    • Bootstrapped - Startup is bootstrapped or not (yes/no)
    • Part of Match off - Competition between two similar brands, pitched at same time
    • Original Ask Amount - Original Ask Amount, in lakhs INR
    • Original Offered Equity - Original Offered Equity, in percentages
    • Valuation Requested - Valuation Requested, in lakhs INR
    • Received Offer - Received offer or not, 1-received, 0-not received
    • Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected
    • Total Deal Amount - Total Deal Amount, in lakhs INR
    • Total Deal Equity - Total Deal Equity, in percentages
    • Total Deal Debt - Total Deal debt/loan amount, in lakhs INR
    • Debt Interest - Debt interest rate, in percentages
    • Deal Valuation - Deal Valuation, in lakhs INR
    • Number of sharks in deal - Number of sharks involved in deal
    • Deal has conditions - Deal has conditions or not? (yes or no)
    • Royalty Percentage - Royalty percentage, if it's royalty deal
    • Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs
    • Advisory Shares Equity - Deal with Advisory shares or equity, in percentages
    • Namita Investment Amount - Namita Investment Amount, in lakhs INR
    • Namita Investment Equity - Namita Investment Equity, in percentages
    • Namita Debt Amount - Namita Debt Amount, in lakhs INR
    • Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR
    • Vineeta Investment Equity - Vineeta Investment Equity, in percentages
    • Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR
    • Anupam Investment Amount - Anupam Investment Amount, in lakhs INR
    • Anupam Investment Equity - Anupam Investment Equity, in percentages
    • Anupam Debt Amount - Anupam Debt Amount, in lakhs INR
    • Aman Investment Amount - Aman Investment Amount, in lakhs INR
    • Aman Investment Equity - Aman Investment Equity, in percentages
    • Aman Debt Amount - Aman Debt Amount, in lakhs INR
    • Peyush Investment Amount - Peyush Investment Amount, in lakhs INR
    • Peyush Investment Equity - Peyush Investment Equity, in percentages
    • Peyush Debt Amount - Peyush Debt Amount, in lakhs INR
    • Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR
    • Ritesh Investment Equity - Ritesh Investment Equity, in percentages
    • Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR
    • Amit Investment Amount - Amit Investment Amount, in lakhs INR
    • Amit Investment Equity - Amit Investment Equity, in percentages
    • Amit Debt Amount - Amit Debt Amount, in lakhs INR
    • Guest Investment Amount - Guest Investment Amount, in lakhs INR
    • Guest Investment Equity - Guest Investment Equity, in percentages
    • Guest Debt Amount - Guest Debt Amount, in lakhs INR
    • Invested Guest Name - Name of the guest(s) who invested in deal
    • All Guest Names - Name of all guests, who are present in episode
    • Namita Present - Whether Namita present in episode or not
    • Vineeta Present - Whether Vineeta present in episode or not
    • Anupam ...
  8. g

    First name statistics for newborns by year of birth in Münster | gimi9.com

    • gimi9.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). First name statistics for newborns by year of birth in Münster | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6d04e8b3-ed6e-406e-b85f-befe678205c2
    Explore at:
    Dataset updated
    Dec 15, 2024
    Description

    This data set contains the first name statistics for newborns in Münster from 2007 to 2021. Two different lists are made available: A first name hit list with the top 30 most commonly used first names, grouped by year of birth and gender. A list of “first name numbers”. This list shows how many babies have been given multiple first names. First name hitlist The table with the first name hitlist contains the following columns: Year = year of birth Rank = Top 30 rank Gender = girl or boy Name = the chosen name Number = Number of children with this name Please note the following additional information: All given first names are taken into account for the calculation of the first name list, i.e. the second and third names. For example, if “Tom” leads the list in a year, that doesn't mean that Tom was the most popular name, but Tom was the most frequently mentioned first name among the total first, second, third and other given names for babies. First name number The table with the first name number contains the following columns: Year = year of birth Children with.. = How many first names Number = number of children The following is an Excel file, which contains both lists in different spreadsheets, as well as two corresponding CSV files.

  9. d

    Data from: Data cleaning and enrichment through data integration: networking...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
    Description

    We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

    the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

    By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

    https://doi.org/10.5061/dryad.wpzgmsbwj

    Manuscript published in Scientific Data with DOI .

    Description of the data and file structure

    This repository contains two main data files:

    • edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);
    • Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

    along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

    • University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;
    • Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

    Description of the main data files

    The `Coauthorship_Networ...

  10. 📝 Dataset containing 479k English words

    • kaggle.com
    zip
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2023). 📝 Dataset containing 479k English words [Dataset]. https://www.kaggle.com/bwandowando/479k-english-words
    Explore at:
    zip(3742977 bytes)Available download formats
    Dataset updated
    Nov 28, 2023
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F01cf9c0b2af1ccd0d330298e9bb0b8e7%2Fenglishwords2.png?generation=1701184750090746&alt=media" alt="">

    This is a direct download and copy of the dataset from https://github.com/dwyl/english-words/ , as the creator of the Github project said

    A text file containing over 466k English words.

    While searching for a list of english words (for an auto-complete tutorial) I found: https://stackoverflow.com/questions/2213607/how-to-get-english-language-word-database which refers to https://www.infochimps.com/datasets/word-list-350000-simple-english-words-excel-readable (archived).

    No idea why infochimps put the word list inside an excel (.xls) file.

    I pulled out the words into a simple new-line-delimited text file. Which is more useful when building apps or importing into databases etc.

    Copyright still belongs to them.

    Files you may be interested in

    • words.txt- contains all words.
    • words_alpha.txt- contains only [[:alpha:]] words (words that only have letters, no numbers or symbols). If you want a quick solution choose this.
    • words_dictionary.json- contains all the words from words_alpha.txt as json format. If you are using Python, you can easily load this file and use it as a dictionary for faster performance. All the words are assigned with 1 in the dictionary.

    Note

    There are two other datasets here in Kaggle that I've seen which are

    But they don't seem to contain the latest updates made from Github, thus I'm uploading it and will use this dataset for a notebook that I will be writing here in Kaggle

    Images

    • created using Bing Image Creator
  11. H

    Appendix 1 on electoral districts for Greece

    • dataverse.harvard.edu
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angeliki Konstantinidou (2024). Appendix 1 on electoral districts for Greece [Dataset]. http://doi.org/10.7910/DVN/GXIFTH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Angeliki Konstantinidou
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Greece
    Description

    This dataset contains the list and classification of electoral districts mentioned in the national and regional datasets. The Excel file version provides the information on districts for both datasets in a single file in two sheets for ease of use. The CSV files (UTF-8) provide the information for each sheet in two separate files.

  12. Supplementary File 1 is an Excel spreadsheet containing a list of molecules...

    • plos.figshare.com
    xlsx
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Durgesh Ameta; Surendra Kumar; Rishav Mishra; Laxmidhar Behera; Aniruddha Chakraborty; Tushar Sandhan (2025). Supplementary File 1 is an Excel spreadsheet containing a list of molecules found in IGD, along with the molecules of Subset-IGD. [Dataset]. http://doi.org/10.1371/journal.pone.0322514.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Durgesh Ameta; Surendra Kumar; Rishav Mishra; Laxmidhar Behera; Aniruddha Chakraborty; Tushar Sandhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary File 1 is an Excel spreadsheet containing a list of molecules found in IGD, along with the molecules of Subset-IGD.

  13. Data from: List of size fractionated eukaryotic plankton community samples...

    • doi.pangaea.de
    • search.dataone.org
    zip
    Updated Feb 20, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colomban De Vargas; Participants Tara Oceans Expedition; Coordinators Tara Oceans Consortium (2015). List of size fractionated eukaryotic plankton community samples and associated metadata (Database W1) [Dataset]. http://doi.org/10.1594/PANGAEA.843017
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 20, 2015
    Dataset provided by
    PANGAEA
    Authors
    Colomban De Vargas; Participants Tara Oceans Expedition; Coordinators Tara Oceans Consortium
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    The present data set provides an Excel file in a zip archive. The file lists 334 samples of size fractionated eukaryotic plankton community with a suite of associated metadata (Database W1). Note that if most samples represented the piconano- (0.8-5 µm, 73 samples), nano- (5-20 µm, 74 samples), micro- (20-180 µm, 70 samples), and meso- (180-2000 µm, 76 samples) planktonic size fractions, some represented different organismal size-fractions: 0.2-3 µm (1 sample), 0.8-20 µm (6 samples), 0.8 µm - infinity (33 samples), and 3-20 µm (1 sample). The table contains the following fields: a unique sample sequence identifier; the sampling station identifier; the Tara Oceans sample identifier (TARA_xxxxxxxxxx); an INDSC accession number allowing to retrieve raw sequence data for the major nucleotide databases (short read archives at EBI, NCBI or DDBJ); the depth of sampling (Subsurface - SUR or Deep Chlorophyll Maximum - DCM); the targeted size range; the sequences template (either DNA or WGA/DNA if DNA extracted from the filters was Whole Genome Amplified); the latitude of the sampling event (decimal degrees); the longitude of the sampling event (decimal degrees); the time and date of the sampling event; the device used to collect the sample; the logsheet event corresponding to the sampling event ; the volume of water sampled (liters). Then follows information on the cleaning bioinformatics pipeline shown on Figure W2 of the supplementary litterature publication: the number of merged pairs present in the raw sequence file; the number of those sequences matching both primers; the number of sequences after quality-check filtering; the number of sequences after chimera removal; and finally the number of sequences after selecting only barcodes present in at least three copies in total and in at least two samples. Finally, are given for each sequence sample: the number of distinct sequences (metabarcodes); the number of OTUs; the average number of barcode per OTU; the Shannon diversity index based on barcodes for each sample (URL of W4 dataset in PANGAEA); and the Shannon diversity index based on each OTU (URL of W5 dataset in PANGAEA).

  14. Sales Dashboard in Microsoft Excel

    • kaggle.com
    zip
    Updated Apr 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavana Joshi (2023). Sales Dashboard in Microsoft Excel [Dataset]. https://www.kaggle.com/datasets/bhavanajoshij/sales-dashboard-in-microsoft-excel/discussion
    Explore at:
    zip(253363 bytes)Available download formats
    Dataset updated
    Apr 14, 2023
    Authors
    Bhavana Joshi
    Description

    This interactive sales dashboard is designed in Excel for B2C type of Businesses like Dmart, Walmart, Amazon, Shops & Supermarkets, etc. using Slicers, Pivot Tables & Pivot Chart.

    Dashboard Overview

    1. Sales dashboard ==> basically, it is designed for the B2C type of business. like Dmart, Walmart, Amazon, Shops & supermarkets, etc.
    2. Slices ==> slices are used to drill down the data, on the basis of yearly, monthly, by sales type, and by mode of payment.
    3. Total Sales/Total Profits ==> here is, the total sales, total profit, and profit percentage these all are combined into a monthly format and we can hide or unhide it to view it as individually or comparative.
    4. Product Visual ==> the visual indicates product-wise sales for the selected period. Only 10 products are visualized at a glance, and you can scroll up & down to view other products in the list.
    5. Daily Sales ==> It shows day-wise sales. (Area Chart)
    6. Sales Type/Payment Mode ==> It shows sales percentage contribution based on the type of selling and mode of payment.
    7. Top Product & Category ==> this is for the top-selling product and product category.
    8. Category ==> the final one is the category-wise sales contribution.

    Datasheets Overview

    1. The dataset has the master data sheet or you can call it a catalog. It is added in the table form.
    2. The first column is the product ID the list of items in this column is unique.
    3. Then we have the product column instead of these two columns, we can manage with only one also but I kept it separate because sometimes product names can be the same, but some parameters will be different, like price, supplier, etc.
    4. The next column is the category column, which is the product category. like cosmetics, foods, drinks, electronics, etc.
    5. Then we have 4th column which is the unit of measure (UOM) you can update it also, based on the products you have.
    6. And the last two columns are buying price and selling price, which means unit purchasing price and unit selling price.

    Input Sheet

    The first column is the date of Selling. The second column is the product ID. The third column is quantity. The fourth column is sales types, like direct selling, are purchased by a wholesaler or ordered online. The fifth column is a mode of payment, which is online or in cash. You can update these two as per requirements. The last one is a discount percentage. if you want to offer any discount, you can add it here.

    Analysis Sheet: where all backend calculations are performed.

    So, basically these are the four sheets mentioned above with different tasks.

    However, a sales dashboard enables organizations to visualize their real-time sales data and boost productivity.

    A dashboard is a very useful tool that brings together all the data in the forms of charts, graphs, statistics and many more visualizations which lead to data-driven and decision making.

    Questions & Answers

    1. What percentage of profit ratio of sales are displayed in the year 2021 and year 2022? ==> Total profit ratio of sales in the year 2021 is 19% with large sales of PRODUCT42, whereas profit ratio of sales for 2022 is 22% with large sales of PRODUCT30.
    2. Which is the top product that have large number of sales in year 2021-2022? ==> The top product in the year 2021 is PRODUCT42 with the total sales of $12,798 whereas in the year 2022 the top product is PRODUCT30 with the total sales of $13,888.
    3. In Area Chart which product is highly sold on 28th April 2022? ==> The large number of sales on 28th April 2022 is for PRODUCT14 with a 24% of profit ratio.
    4. What is the sales type and payment mode present? ==> The sale type and payment modes show the sales percentage contribution based on the type of selling and mode of payment. Here, the sale types are Direct Sales with 52%, Online Sales with 33% and Wholesaler with 15%. Also, the payment modes are Online mode and Cash equally distributed with 50%.
    5. In which month the direct sales are highest in the year 2022? ==> The highest direct sales can be easily identified which is designed by monthly format and it’s the November month where direct sales are highest with 28% as compared with other months.
    6. Which payment mode is highly received in the year 2021 and year 2022? ==> The payments received in the year 2021 are the cash payments with 52% as compared with online transactions which are 48%. Also, the cash payment highly received is in the month of March, July and October with direct sales of 42%, Online with 45% and wholesaler with 13% with large sales of PRODUCT24. ==> The payments received in the year 2022 are the Online payments with 52% as compared with cash payments which are 48%. Also, the online payment highly received is in the month of Jan, Sept and December with direct sales of 45%, Online with 37% and whole...
  15. o

    Update of the Xylella spp. host plant database

    • explore.openaire.eu
    • zenodo.org
    Updated Jun 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Food Safety Authority (2021). Update of the Xylella spp. host plant database [Dataset]. http://doi.org/10.5281/zenodo.1339343
    Explore at:
    Dataset updated
    Jun 23, 2021
    Authors
    European Food Safety Authority
    Description

    Following a request from the European Commission, in 2018 EFSA released a renovated database of host plant species of Xylella spp. (including both species X. fastidiosa and X. taiwanensis) together with a scientific report (EFSA, 2018). EFSA was tasked to maintain and update this database periodically. In May 2021, EFSA released the fourth update of the Xylella spp. host plant database (VERSION 4) with information retrieved from literature search up to December 2020, Europhyt outbreak notifications up to 18 March 2021, and communications of research groups and national authorities (EFSA, 2021). The protocol applied for the extensive literature review, data collection and reporting, as well as results and lists of host plants are described in detail in the related scientific report (EFSA, 2021). The overall number of Xylella spp. host plants determined with at least two different detection methods or positive with one method (between: sequencing, pure culture isolation) reaches now 385 plant species, 179 genera and 67 families (category A – see section 2.4.2 of EFSA (2021)). Such numbers rise to 638 plant species, 289 genera and 87 families if considered regardless of the detection method applied (category E, see section 2.4.2 of EFSA (2021). The Excel files here attached represent the VERSION 4 of the Xylella spp. host plants database. For a detailed description of the information included in the database, please consult the related scientific report (EFSA, 2021). The Excel file “Xylella spp. host plants database – VERSION 4” contains several sheets: the LEGENDA (with extensive description of each table), the full detailed raw data of the Xylella spp. host plant database (sheet “observation”) and several examples of data extraction. Additional Excel files contain the lists of host plant species of X. fastidiosa (subsp. unknown (i.e. not reported), fastidiosa, multiplex, pauca, morus, sandyi, tashke, fastidiosa/sandyi) and X. taiwanensis infected naturally, artificially and in not specified conditions, and according to different categories (A,B,C,D,E – see section 2.4.2 of EFSA (2021)). The Excel file “new_host_plant_species_v4” contain the list of new host plant species added to the database in this fourth update. Question number: EFSA-Q-2017-00221 Correspondence: alpha@efsa.europa.eu Bibliography: EFSA (European Food Safety Authority), 2018. Scientific report on the update of the Xylella spp. host plant database. EFSA Journal 2018;16(9):5408, 87 pp. https://doi.org/10.2903/j.efsa.2018.5408 EFSA (European Food Safety Authority), Delbianco A, Gibin D, Pasinato L and Morelli M, 2021. Scientific report on the update of the Xylella spp. host plant database – systematic literature search up to 31 December 2020. EFSA Journal 2021;19(6):6674, 70 pp. https://doi.org/10.2903/j.efsa.2021.6674

  16. Commission Model examples

    • kaggle.com
    zip
    Updated Dec 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lamar McMillan (2021). Commission Model examples [Dataset]. https://www.kaggle.com/datasets/lamarmcmillan/commission-model-examples/code
    Explore at:
    zip(13275 bytes)Available download formats
    Dataset updated
    Dec 22, 2021
    Authors
    Lamar McMillan
    Description

    Context

    I am showcasing the financial commissions model on Kaggle. On Excel we can utilize IF statements to chart rates that reward workers based on quotas. By compiling sales on a large or small scale we can easily derive the necessary compensation for workers.

    Content

    The first sheet uses simple IF statements to derive a commission payment for different rates. The Sales company exceeded their quota of $95,000.00, and reached $99,343.00, which is a 104.6% return on investment.

    On sheet 2 there is a detailed breakdown of individual employee rates and their deserved commission. The difference in sheet 2 is the use of nested IF statements, which can get much more complex if not catalogued properly.

    Acknowledgements

    There are two guides on YouTube which I credit heavily for these models here are the links: https://www.youtube.com/watch?v=bkrSVS7-CYo&list=PLQnuraB9JKXdUlDVZtcfG2_sO_uL_XyMm&index=4 https://www.youtube.com/watch?v=0Ahqr6Xdkos&list=PLQnuraB9JKXdUlDVZtcfG2_sO_uL_XyMm&index=12

    Inspiration

    Thanks for reading, and enjoy!

  17. UK_County and Region

    • kaggle.com
    zip
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristína Kazimierová (2025). UK_County and Region [Dataset]. https://www.kaggle.com/datasets/kristnakazimierov/uk-county-and-region
    Explore at:
    zip(10291 bytes)Available download formats
    Dataset updated
    Nov 13, 2025
    Authors
    Kristína Kazimierová
    Area covered
    United Kingdom
    Description

    The dataset contains a list of some UK counties and their classification into regions. Excel has two columns - county and region. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26942353%2Ff61a628de82f20b9857ec06b9d51b8d0%2FUK_C-R.png?generation=1763987259962447&alt=media" alt="">

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
mukul (2020). Positive and Negative Word List.rar [Dataset]. https://www.kaggle.com/mukulkirti/positive-and-negative-word-listrar
Organization logo

Positive and Negative Word List.rar

List of the words that have negative or positive sense

Explore at:
zip(84813 bytes)Available download formats
Dataset updated
Sep 12, 2020
Authors
mukul
Description

Context

The idea behind creating this dataset is to Use of negative and Positive word Sense for the research purpose. it has been made for research related to linguistic, like NLP, AI, Behaviour Detection and many more . it helps to: 1. Research whether language utilized in science abstracts can skew towards the employment of strikingly positive and negative words over time.
2. The yearly frequencies of positive, negative, and neutral words, plus 100 randomly selected words were normalised for the whole number of abstracts. 3. Subanalyses included pattern quantification of individual words, specificity for selected high impact journals, and comparison between author affiliations within or outside countries with English because the official majority language.

in an analysis Frequency patterns were compared with 4% of all books ever printed and digitised by use of Google Books Ngram Viewer. Main outcome measures Frequencies of positive and negative words in abstracts compared with frequencies of words with a neutral and random connotation, expressed as relative change since 1980 so it can help in these tasks too. Results absolutely the frequency of positive words increased from 2.0% (1974-80) to 17.5% (2014), a relative increase of 880% over four decades. All 25 individual positive words contributed to the rise, particularly the words “robust,” “novel,” “innovative,” and “unprecedented,” which increased in ratio up to fifteen 000%. Comparable but less pronounced results were obtained when restricting the analysis to chose journals with high impact factors. Authors affiliated to an institute during a non-English speaking country used significantly more positive words. Negative word frequencies increased from 1.3% (1974-80) to three.2% (2014), a relative increase of 257%. Over the identical period of time, no apparent increase was found in neutral or random word use, or within the frequency of positive word use in published books. so lexicographic analysis indicates that scientific abstracts are currently written with more positive and negative words, and provides an insight into the evolution of scientific writing. Apparently scientists look on the brilliant side of research results. So THis data set can play major role in research.

Content

About The Data Set: 1. Dataset is in Excel File Format. 2. Dataset Has two Column (I) Negative Word List (II) Positive Word List 3. In the Dataset Total 4699, Positive Words and Total 4722 Negative Words are theirs. 4. Dataset is collected data from different sources. 5. The dataset has some Null (nan) Values. 6. Please check the Data Once before Use.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Just to see how it can help in many NLP related Tasks.

Search
Clear search
Close search
Google apps
Main menu