Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Filename: SEO_data.csv
Size: 56.63 MB
Rows: ~100,000+
Columns: 7
Language: Primarily English (may contain multilingual snippets)
This dataset contains structured data scraped from Google Search Engine Results Pages (SERPs), specifically curated for SEO and machine learning research. It includes search rankings and metadata for various keywords, capturing how websites rank and present their content on search engines.
| Column Name | Description |
|---|---|
words | The search keyword or query entered into Google |
rank | The result's position on the search engine results page (1 = top) |
title | The meta title of the page |
h1 | The primary <h1> tag from the page (if available) |
snippet | The search result snippet/description shown on Google |
links | The URL of the ranked result |
total_result | The total number of search results Google reports for the query |
| words | rank | title | h1 | snippet | links | total_result |
|---|---|---|---|---|---|---|
| Artificial intelligence | 1 | Beginning Your Journey to Implementing Artificial Intelligence | Beginning Your Journey... | Gérer les éditeurs grâce à des services... | https://www.softwareone.com/... | 776,000,000 |
Enjoy
Facebook
Twitter"*******" was the most frequently searched keyword on Google worldwide, with over ***** million monthly online searches during the analyzed period of January to March in 2025. Furthermore, the search resulted in more than ***** million website visits, or more than **** percent of all traffic. With *** million monthly searches, "***" was the second most popular keyword, and "***********" came in third place with about ****** million searches per month.
Facebook
TwitterThis dataset was created by Harshii_Sharma
Facebook
TwitterBetween June 2022 and March 2023, the traffic volume for the keyword "AI" has tripled, going from around 7.9 million monthly searches to more than 30.4 million during the last month of the measured period. General interest in artificial intelligence (AI) has exploded in markets like the United States by the end of 2022. Likewise, interest for the application programming interfaces (API's) and plugins of artificial intelligence solutions, especially those of ChatGPT, has also seen a major increase since the release of the tool in November of 2022.
The artificial intelligence market
Valued at around 142.3 billion U.S. dollars in 2022, the artificial intelligence market is one the most promising tech segments for the rest of the decade, with more than five billion U.S. dollars invested in startups - the most notable being the Californian company OpenAI and its flagship application ChatGPT. Disruptive as it is, the adoption of AI has already sparked an alert for several industries, likely to affect job markets and thus raising concerns about cybercrime and other online misdeeds.
The future of online search?
Of most industries, the impact of the new tool developed by OpenAI may be felt by the online search market like a global earthquake. With chatbots providing search results in a dialogue format, the trend of AI-powered search engines unleashed by ChatGPT threw giant companies like Google and Microsoft into a race with startups and other competitors to present the best candidate for this disruptive (and experimental) online solution.
Facebook
TwitterThe data is obtained from Hellium10 which is a popular Amazon seller tool. Hellium10 is a reputable tool for many Amazon sellers and supplies a lot of data to analyze products, keywords, and markets.
Amazon does not share its data with third parties. Hence the data does not reflect the real values.
The data is about the most searched keywords in the Amazon Electronics Category. Is created in January 2024. Thus, it reflects the data before that time.
Data contains 21 columns with more than 4000 phrases with a lot of different details such as Search Volume, Fulfillment Type, Size Tier, and Variation.
Data is eligible for educational purposes. You can make Exploratory Data Analysis, Data visualization, and Data manipulation practices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Objectives: To investigate the search strategies and keyword searches used in 95 residential care reviews. Study design: Methodological study (cross-sectional) Methods: First, I attempted to download the full-text versions of all 95 residential care reviews identified in a recently published project. I then searched the full-texts obtained to identify the database search strategies used. I extracted all residential care keywords used in the first search strategy identified in an Excel file. Keywords related to kinship care were not extracted. All keywords extracted were also added to a personal list of residential care keywords, if not included already. The titles and abstracts of all residential care reviews selected were extracted in an Excel file and analyzed using Excel’s COUNTIF function to identify the most commonly occurring keywords/strings. The sensitivity of residential care keywords found within my personal list but not within search strategies was then assessed using Excel’s COUNTIF function. Results: Among the 95 residential care reviews, 5 (5,26%) did not report a search strategy, a search strategy was mentioned but not found for 2 reviews (2,11%) and I could not access the search strategy for 5 reviews (5,26%). Keywords were not extracted from 4 reviews given extensive use of controlled vocabulary (MeSH) or advanced search functions (adj, near, ?, etc.). The only review that did not report searches conducted in English was excluded from analysis. This left 78 review search strategies for analysis. Review authors used from 0 to 53 residential care keywords/strings (mean = 9 keywords, median = 7,5 keywords). 288 unique keywords/strings were used by review authors. The 10 most commonly used keywords were: foster care (51,28%), residential care (47,44%), out of home care (29,49%), out-of-home care (24,36%), group home (20,51%), institutional care (19,23%), children’s home (17,95%), child welfare (16,67%), looked after (16,67%) and looked-after (14,10%). 198 keywords/strings were only found once. The keywords most commonly found within the titles and abstracts of residential care reviews were: foster, foster care, resident, residential, placement, in care, residential care, institution, out-of-home and out-of-home care. Four keywords/strings were found in more than 4% of the titles and abstracts of residential care reviews but could not be identified within residential care reviews search strategies: care setting, out of care, residential youth care and care placement. Funding: No funding was received for this work. Registration and study protocol: See https://osf.io/7dqkp Data and materials: See https://osf.io/7dqkp. All other data should otherwise be included within this manuscript. Keywords: Children’s homes, residential care, electronic searches, systematic review
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Quân Phạm Ngọc
Released under MIT
Facebook
Twitter2026 guide to selecting, prioritising and using organic SEO keywords without over-optimising, using a method focused on ROI.
Facebook
TwitterNon-traditional data signals from social media and employment platforms for KYYWF stock analysis
Facebook
TwitterBigDBM's purchase Intent Data transforms how businesses understand and engage with their customers by providing a comprehensive, real-time view of buyer purchase intent data across both US consumer and B2B markets. With over 20 years of expertise in building identity graphs, our platform processes more than 110 million distinct hashed email addresses daily, delivering actionable insights that drive measurable ROI.
Our proprietary methodology combines data from multiple live streams and maps website domains and emails to IAB classification codes, giving you structured insights into market interests and purchase intent. Through advanced natural language processing and a custom five-tiered taxonomy, we extract granular keywords while maintaining broad category classification for flexible targeting.
Our unique intent intensity scoring quantifies purchase likelihood based on frequency and consistency of interest, while timestamp tracking reveals behavioral shifts and trends over time. With robust privacy compliance and ethical sourcing practices, BigDBM enables organizations to make smarter, faster decisions across customer acquisition, retention, and engagement through applications in account-based marketing, audience expansion, data enrichment, and programmatic advertising.
We do not provide any phone details from Colorado residents.
Facebook
TwitterA keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
Facebook
TwitterOriginal quantitative research dataset analyzing AI search query fan-out behavior across 173,902 URLs and 10,000 keywords with citations across Google AI Mode, ChatGPT, and Perplexity platforms
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides material relating to a questionnaire entitled "Semantic Web: Perspectives". This questionnaire was addressed to the W3C Semantic Web mailing list (semantic-web@w3.org) and was open to responses from May 12th to May 25th, 2019. A total of 113 responses were collected in this time. The following files are provided:
public-comments.txt: provides the public comments of respondents in plain text;
questionnaire-form.pdf: illustrates the design of the questionnaire, including questions, types of responses permitted, etc.;
questionnaire-responses.tsv: lists the individual responses (without private comments) as a tab-separated values file;
success-keywords.xlsx: provides a spreadsheet mapping success story responses to a list of keywords, further providing statistics on these keywords;
wordcloud-bw.svg: provides a word-cloud of success-story keywords in black & white;
wordcloud-colour.svg: provides a word-cloud of success-story keywords in colour.
The word-clouds were produced using Jason Davies' online service, copying and pasting the keywords from the success-keywords.xlsx spreadsheet (e.g., Column A, Sheet Statistics) into the text field; the following settings were selected: Orientations from 0° to 0°, Spiral: Rectangular; Scale: n; Number of words: 400; One word per line: ticked; Font: Patua One (must be installed locally beforehand). The resulting SVG files were later modified in a text editor to add a link to the font used, to tighten the bounding box, and to produce a black & white version.
We thank the respondents for providing their input.
Facebook
Twitterhttp://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
2008 city statistical data integrated with administrative boundaries for prefecture and county level cities.
Facebook
TwitterKeyword feed is created by filtering raw data through a specified keyword configuration and allows for tracking web traffic with respect to various topics, e.g.: - public companies - brands - products By analyzing the feed, it is possible to evaluate popularity and sentiment surrounding the chosen phrase over time.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
The work consists of tools for the interaction between Wikidata and OBO Foundry and source codes for the use of MeSH keywords of PubMed publications for the enrichment of biomedical knowledge in Wikidata. This work is funded by the Adapting Wikidata to support clinical practice using Data Science, Semantic Web and Machine Learning Project within the framework of the Wikimedia Foundation Research Fund.To cite the work: Turki, H., Chebil, K., Dossou, B. F. P., Emezue, C. C., Owodunni, A. T., Hadj Taieb, M. A., & Ben Aouicha, M. (2024). A framework for integrating biomedical knowledge in Wikidata with open biomedical ontologies and MeSH keywords. Heliyon, 10(19), e38488. doi:10.1016/j.heliyon.2024.e38448.Wikidata-OBOtool1.py: A tool for the verification of the semantic alignment between Wikidata and OBO ontologies.frame.py: The layout of Tool 1.tool2.py: A tool for extracting Wikidata relations between OBO ontology items.frame2.py: The layout of Tool 2.tool3.py: A tool for extracting multilingual language data for OBO ontology items from Wikidata.frame4.py: The layout of Tool 3.Wikidata-MeSHcorrect_mesh2matrix_dataset.py: A source code for turning MeSH2Matrix into a smaller dataset for the biomedical relation classification based on the MeSH keywords of PubMed publications, named MiniMeSH2Matrix.build_numpy_dataset.py: A source code for building the numpy files for MiniMeSH2Matrix (Relation type-based classification).label_encoded.csv: A table for the conversion of Wikidata Property IDs into MeSH2Matrix Class IDs.new_encoding.csv: A table for the conversion of Wikidata Property IDs into MiniMeSH2Matrix Class IDs.super_classes_new_dataset_labels.npy: The NumPy File of the labels for the superclass-based classification.new_dataset_labels.npy: The NumPy File of the labels for the relation type-based classification.new_dataset_matrices.npy: The Numpy File of the MiniMeSH2Matrix matrices for biomedical relation classification.first_level_new_data.json: The JSON File for the conversion of relation types to superclasses.build_super_classes.py: A source code for building the numpy files for MiniMeSH2Matrix (Superclass-based classification).FC_MeSH_Model_57_New_Data.ipynb: A Jupyter Notebook for training a Dense Model to perform the relation type-based classification.FC_MeSH_Model_57_New_Data_SuperClasses.ipynb: A Jupyter Notebook for training a Dense Model to perform the superclass-based classification.new_data_best_model_1: A stored edition of the best model for the relation type-based classification.new_data_super_classes_best_model_1: A stored edition of the best model for the superclass-based classification.MiniMeSH2Matrix_SuperClasses_Confusion_Matrix.ipynb: A Jupyter Notebook for generating the confusion matrix for the superclass-based supervised classification.MiniMeSH2Matrix_Supervised_Classification_Agreement.ipynb: A Jupyter Notebook for generating the matrix of agreement between the accurate predictions for superclass-based classification and the ones for relation type-based classification.Adding_References_to_Wikidata.ipynb: A Jupyter Notebook to identify the PubMed ID of relevant references to unsupported Wikidata statements between MeSH terms.MeSH_Statistics.xlsx: Statistical data about MeSH-based items and relations in Wikidata.ref_for_unsupported_statements.csv: Retrieved Relevant PubMed References for 1k unsupported Wikidata statements.evaluate_pubmed_ref_assignment.ipynb: A Jupyter Notebook that generates statistics about reference assignment for a sample of 1k unsupported statements.MeSH_Verification.xlsx: A list of inaccurate or duplicated MeSH IDs in Wikidata, as of August 8th, 2023.WikiRelationsPMI.csv: A list of PMI values for the semantic relations between MeSH terms, as available in Wikidata.WikiRelationsPMIDistribution.xlsx: Distribution of PMI values for all Wikidata relations and for specific Wikidata relation types.WikiRelationsToVerify.xlsx: Wikidata relations needing attention because they involve Wikidata items with inaccurate MeSH IDs, they cannot be found in PubMed, or their PMI values are below the threshold of 2.Mesh_part1.py: A Python code that verifies the accuracy of the MeSH IDs for the Wikidata items.MeshWikiPart.py: A Python code that computes the pointwise mutual information values for Wikidata relations between MeSH keywords based on PubMed.Demo.ipynb: A demo of the MeSH-based biomedical relation validation and classification in French.Id_Term.json: A dict of Medical Subject Headings labels corresponding to MeSH Descriptor ID.dict_mesh.json: Number of the occurrences of MeSH keywords in PubMed.finalmatrix.xlsx: Matrix of PMI values between the 5k most common MeSH Keywords.finalmatrixrev.pkl: Pickle File Edition of the PMI matrix.pmi2.xlsx: List of significant PMI associations between the 5k most common MeSH Keywords reaching a threshold of 2.Generate5kMatrix.py: A Python code that generates the PMI matrix.clean_pmi2.py: A Python code to remove the relations already available in Wikidata from pmi.xlsx.missing_rels.xlsx: The final list of the significant PMI associations that do not exist in Wikidata.item_category.json: A dict for MeSH tree categories corresponding to MeSH items.item_categorization.py: A Python code that generates a dict for MeSH tree categories corresponding to MeSH items.classification.py: A Python code for classifying PMI-generated semantic relations between the most common MeSH Keywords.results.xlsx: The output of the classification of the PMI-generated semantic relations between the most common MeSH Keywords.ClassificationStats.ipynb: A Jupyter Notebook for generating statistical data about the classification.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
High frequency keywords classified as cultural value perception (Top 20).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of models included in the MCS, at the 90% confidence level, using the and statistics and the MSE loss function.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the data that were used in a review paper "From urban data to city-scale models: A review of traffic simulation case studies". It contains the following files:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Seven analysis-ready CSV files derived from 30 years of Starbucks 10-K annual reports (FY1996–FY2025), covering store expansion data, keyword frequency analysis, LDA topic modeling results, and document-level text statistics.
These CSVs were generated from the original 10-K filings through the following steps: 1. Download 30 annual 10-K filings from SEC EDGAR (CIK: 0000829224) 2. Extract Item 1 (Business) section from each filing 3. Strip HTML/XBRL tags, normalize whitespace 4. Tokenize → compute keyword frequencies (raw + per 10K words) 5. For LDA: chunk each document into ~150-word segments → 847 chunks → train 7-topic LDA model → aggregate topic proportions by year
The raw 10-K texts are not included (file size), but are freely available from SEC EDGAR as public domain documents.
| Source | License |
|---|---|
| SEC EDGAR 10-K filings | Public domain (US government) |
| Store counts | Extracted from 10-K Item 1 text (public domain) |
| Notebook | Theme | Link |
|---|---|---|
| Manhattan Cafe Wars | Theme 0: EDA & competitor mapping | Open |
| Starbucks 10-K NLP | Theme 1: keyword trends, LDA topics, NLP × store count | Open |
| Starbucks Spatial Clustering | Theme 2A: Moran's I, LISA, Ripley's K | Open |
| Starbucks Location Fitness | Theme 2B: demand-supply scoring & backtest | Open |
| Starbucks Data Pipeline | Pipeline: EDGAR & OSM to CSV, data quality report | Open |
Related dataset: Manhattan Café Wars: Starbucks & Subway — spatial data for Theme 0, 2A, 2B
| Column | Type | Description |
|---|---|---|
| fiscal_year | int | Fiscal year (1996–2025) |
| co_us | int | Company-operated stores in the US |
| co_international | int | Company-operated stores outside the US |
| lic_us | int | Licensed stores in the US |
| lic_international | int | Licensed stores outside the US |
| total_worldwide | int | Total store count worldwide |
| source_note | str | Data extraction note |
| co_total | int | Total company-operated stores |
| lic_total | int | Total licensed stores |
| us_total | int | Total US stores |
| intl_total | int | Total international stores |
| pct_licensed | float | Percentage of stores that are licensed |
| pct_international | float | Percentage of stores outside the US |
| yoy_growth | float | Year-over-year growth rate (%) |
| yoy_change | int | Year-over-year change in store count |
| ceo | str | CEO name for the fiscal year |
Each keyword has two columns: {keyword} (raw count) and {keyword}_per10k (frequency per 10,000 words).
| Column | Description |
|---|---|
| fiscal_year | Fiscal year (1996–2025) |
| total_words | Total word count of the Item 1 section |
| digital / digital_per10k | "digital" occurrences |
| mobile / mobile_per10k | "mobile" occurrences |
| experience / experience_per10k | "experience" occurrences |
| china / china... |
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Filename: SEO_data.csv
Size: 56.63 MB
Rows: ~100,000+
Columns: 7
Language: Primarily English (may contain multilingual snippets)
This dataset contains structured data scraped from Google Search Engine Results Pages (SERPs), specifically curated for SEO and machine learning research. It includes search rankings and metadata for various keywords, capturing how websites rank and present their content on search engines.
| Column Name | Description |
|---|---|
words | The search keyword or query entered into Google |
rank | The result's position on the search engine results page (1 = top) |
title | The meta title of the page |
h1 | The primary <h1> tag from the page (if available) |
snippet | The search result snippet/description shown on Google |
links | The URL of the ranked result |
total_result | The total number of search results Google reports for the query |
| words | rank | title | h1 | snippet | links | total_result |
|---|---|---|---|---|---|---|
| Artificial intelligence | 1 | Beginning Your Journey to Implementing Artificial Intelligence | Beginning Your Journey... | Gérer les éditeurs grâce à des services... | https://www.softwareone.com/... | 776,000,000 |
Enjoy