8 datasets found
  1. Data from: Programmable Web

    • kaggle.com
    zip
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rik (2025). Programmable Web [Dataset]. https://www.kaggle.com/datasets/rimkomatic/programmable-web/data
    Explore at:
    zip(5490552 bytes)Available download formats
    Dataset updated
    Mar 26, 2025
    Authors
    Rik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ProgrammableWeb Dataset

    Overview

    This dataset contains structured information about APIs, mashups, and categories from ProgrammableWeb, one of the most comprehensive directories of web APIs. The data has been extracted from a MySQL database and converted into CSV format for easy use in data analysis and machine learning applications.

    Dataset Contents

    The dataset is composed of multiple CSV files, each representing a different aspect of the API ecosystem:

    1. Category.csv

    Contains information about API categories. - ID: Unique identifier for the category. - Name: Name of the category. - PwURL: ProgrammableWeb URL for the category. - Amount: Count of APIs in this category (approximate).

    2. ApiSketch.csv

    Stores basic details about APIs before full data retrieval. - Name: API name. - PwURL: API URL on ProgrammableWeb. - Description: Short API description. - CategoryName: Primary category of the API. - CategoryURL: URL of the category. - SubmitDate: Date the API was submitted.

    3. ApiBasic.csv

    Contains detailed information about APIs. - ID: Unique API identifier. - Name: API name. - PwURL: API URL. - Provider: API provider. - ProviderURL: API provider's website. - PorHomePage: API portal/homepage. - Endpoint: API endpoint. - Version: API version. - Type: API type (1-Browser, 2-Product, 3-Standard, 4-System/Embedded, 5-Web/Internet). - ArchStyle: Architectural style (1-Indirect, 2-Native/Browser, 3-Push/Streaming, 4-REST, 5-RPC). - IsDeviceSpec: Whether the API is device-specific (0-False, 1-True). - Scope: API scope (1-Metaservice API, 2-Microservice API, 3-Single Purpose API). - Description: Detailed API description.

    4. ApiAddition.csv

    Includes API metadata and support information. - ID: API ID. - DocsHomePage: Documentation URL. - TwitterURL: Twitter support URL. - SupEmail: Support email. - Forum: API forum/message boards. - ConsoleURL: Interactive console URL. - TermURL: Terms of service URL. - DescFileURL: API description file URL. - DescFileType: File type (e.g., Swagger, RAML, WSDL). - IsNonPrptry: Whether the API is non-proprietary (0-False, 1-True). - LiceType: License type. - IsSslSup: SSL support (0-False, 1-True). - AuthModel: Authentication model. - ReqFmt: Supported response formats. - IsHyperApi: Hypermedia API flag (0-False, 1-True). - IsRstctAces: Restricted access (0-False, 1-True). - IsUnofficial: Whether it's an unofficial API (0-False, 1-True).

    5. ApiCate.csv

    Maps APIs to their respective categories. - ApiID: API ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

    6. MashupSketch.csv

    Stores basic details about mashups. - Name: Mashup name. - PwURL: Mashup URL. - Description: Short description. - CategoryName: Primary category. - CategoryURL: Category URL. - SubmitDate: Submission date.

    7. Mashup.csv

    Detailed information about mashups. - ID: Unique mashup ID. - Name: Mashup name. - PwURL: Mashup URL. - Company: Company associated with the mashup. - URL: Direct link to the mashup. - Description: Detailed description. - Type: Type (1-Web, 2-Mobile, 3-Desktop, 4-Other).

    8. MashupCate.csv

    Maps mashups to categories. - MashupID: Mashup ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

    9. MashupApi.csv

    Maps mashups to the APIs they use. - MashupID: Mashup ID. - ApiID: API ID.

    Usage

    • This dataset is ideal for research on API usage trends, category distributions, and mashup compositions.
    • It can be used to study API popularity, analyze technological trends, or build recommendation systems for developers looking for APIs.

    Acknowledgments

    Data sourced from ProgrammableWeb.

  2. Worldwide web application and API risk density 2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Worldwide web application and API risk density 2024 [Dataset]. https://www.statista.com/statistics/805982/worldwide-application-layer-risk-density/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    World
    Description

    According to a 2024 study, nearly ** percent of web application and API risks were categorized as low. Only *** percent of web application and API risks were critical in 2024.

  3. d

    DataForSEO Labs API for keyword research and search analytics, real-time...

    • datarade.ai
    .json
    Updated Jun 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 4, 2021
    Dataset authored and provided by
    DataForSEO
    Area covered
    Isle of Man, Korea (Democratic People's Republic of), Armenia, Cocos (Keeling) Islands, Morocco, Mauritania, Micronesia (Federated States of), Kenya, Tokelau, Azerbaijan
    Description

    DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

    • Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

    Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

    You will find well-rounded ways to scout the competitors:

    • Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

    All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

    The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

    We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

    We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.

  4. Developer Doc URLs vs Non-Developer URLs Dataset

    • kaggle.com
    zip
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Mohammed Faham (2025). Developer Doc URLs vs Non-Developer URLs Dataset [Dataset]. https://www.kaggle.com/datasets/iamfaham/developer-doc-urls-vs-non-developer-urls-dataset
    Explore at:
    zip(96516 bytes)Available download formats
    Dataset updated
    Jul 23, 2025
    Authors
    Syed Mohammed Faham
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset contains 2,000 labeled URLs enriched with structural and content-based metadata, curated for binary classification tasks. Each URL is labeled as either a developer documentation link (1) or a non-developer link (0).

    The goal is to help machine learning practitioners build robust models that can distinguish between technical documentation and general web content. This dataset is especially useful for building developer-facing search, filtering, and recommendation systems.

    Columns

    url_dataset_enhanced.csv

    • url: The full web URL.
    • url_length: Length of the URL in characters.
    • path_depth: Number of segments in the URL path.
    • has_query: 1 if the URL contains a query string (?), else 0.
    • has_keyword: 1 if the URL contains common dev-related keywords (e.g., docs, api, reference), else 0.
    • has_docs_subdomain: 1 if the subdomain includes 'docs' (e.g., docs.python.org), else 0.
    • code_count: Number of code snippets extracted from the page.
    • tech_keyword_count: Count of tech-related terms found in the page.
    • content_length: Length of extracted HTML/text content (in characters).
    • has_table: 1 if a <table> tag is present in the content, else 0.
    • is_dev_docs:
      • 1 for developer documentation (e.g., SDK docs, API references).
      • 0 for non-developer pages (e.g., blog posts, landing pages).

    url_dataset.csv

    • url: The full web URL.
    • is_dev_docs:
      • 1 for developer documentation (e.g., SDK docs, API references).
      • 0 for non-developer pages (e.g., blog posts, landing pages).

    Key Statistics

    • Total entries: 2,000
    • Developer documentation (1): 1,000
    • Non-developer links (0): 1,000

    Potential Use Cases

    • Train binary classifiers using traditional ML or LLM embeddings.
    • Develop document filtering systems for technical web crawlers.
    • Benchmark classification performance on real-world noisy data.
    • Study web structure and content heuristics useful for link classification.
    • Build domain-specific data pipelines or RAG systems that focus on technical sources.

    Format

    • Filename: url_dataset_enhanced.csv & url_dataset.csv
    • Rows: 2,000
    • Encoding: UTF-8
    • File type: CSV with headers

    Acknowledgements

    This dataset was manually curated, preprocessed, and labeled from scratch. It is released for public research and educational use.

    License

    This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0).

  5. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  6. b

    ClassyFire

    • bioregistry.io
    Updated Apr 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). ClassyFire [Dataset]. https://bioregistry.io/classyfire
    Explore at:
    Dataset updated
    Apr 29, 2021
    Description

    ClassyFire is a web-based application for automated structural classification of chemical entities. This application uses a rule-based approach that relies on a comprehensible, comprehensive, and computable chemical taxonomy. ClassyFire provides a hierarchical chemical classification of chemical entities (mostly small molecules and short peptide sequences), as well as a structure-based textual description, based on a chemical taxonomy named ChemOnt, which covers 4825 chemical classes of organic and inorganic compounds. Moreover, ClassyFire allows for text-based search via its web interface. It can be accessed via the web interface or via the ClassyFire API.

  7. Spotify Tracks Popularity

    • kaggle.com
    zip
    Updated Sep 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lynnxxx (2025). Spotify Tracks Popularity [Dataset]. https://www.kaggle.com/datasets/lynnxxx/spotify-tracks-popularity-classification
    Explore at:
    zip(6144231 bytes)Available download formats
    Dataset updated
    Sep 17, 2025
    Authors
    Lynnxxx
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context Spotify for Developers offers a wide range of possibilities to utilize the extensive catalog of Spotify data. One of them are the audio features calculated for each song and made available via the official Spotify Web API. This dataset contains 9,460 Spotify tracks with comprehensive audio features and metadata, specifically curated for music popularity classification and machine learning projects. The data has been filtered and processed to ensure high quality and completeness for analysis purposes.

    Content Each track (row) contains 28 features including: Track Information: Artist name, track name, track ID, release date, and popularity score Audio Features: Danceability, energy, valence, acousticness, instrumentalness, liveness, speechiness, tempo, and loudness Technical Metadata: Musical key, mode, time signature, duration, and Spotify API references Additional Data: Genres, lyrics, preview URLs, and playlist information The popularity feature (0-100 scale) serves as the primary target variable for classification tasks.

    Acknowledgements Credit goes entirely to Spotify for providing this data via their Web API. The audio features are calculated by Spotify's proprietary algorithms and represent the most comprehensive music analysis data available. Reference: https://developer.spotify.com/documentation/web-api

  8. indoor plants data set from api

    • kaggle.com
    zip
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abhishek (2023). indoor plants data set from api [Dataset]. https://www.kaggle.com/datasets/iottech/plant
    Explore at:
    zip(5870 bytes)Available download formats
    Dataset updated
    Mar 2, 2023
    Authors
    abhishek
    Description

    this is the dataset of the indoor plants containing the following fetures:- 1) common name:- it is the name of the plant in the local region 2)family:- this column tells about the family of the plant 3)categories:- category of family origin:-the origin of the plant where the plant was found or grown first. climate:- a climate which is suitable for that plant zone:- (latitudes and longitudes) img_url:- an image URL of the plant.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rik (2025). Programmable Web [Dataset]. https://www.kaggle.com/datasets/rimkomatic/programmable-web/data
Organization logo

Data from: Programmable Web

API Data for Research & Analysis - CSV Dataset from ProgrammableWeb

Related Article
Explore at:
zip(5490552 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Rik
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

ProgrammableWeb Dataset

Overview

This dataset contains structured information about APIs, mashups, and categories from ProgrammableWeb, one of the most comprehensive directories of web APIs. The data has been extracted from a MySQL database and converted into CSV format for easy use in data analysis and machine learning applications.

Dataset Contents

The dataset is composed of multiple CSV files, each representing a different aspect of the API ecosystem:

1. Category.csv

Contains information about API categories. - ID: Unique identifier for the category. - Name: Name of the category. - PwURL: ProgrammableWeb URL for the category. - Amount: Count of APIs in this category (approximate).

2. ApiSketch.csv

Stores basic details about APIs before full data retrieval. - Name: API name. - PwURL: API URL on ProgrammableWeb. - Description: Short API description. - CategoryName: Primary category of the API. - CategoryURL: URL of the category. - SubmitDate: Date the API was submitted.

3. ApiBasic.csv

Contains detailed information about APIs. - ID: Unique API identifier. - Name: API name. - PwURL: API URL. - Provider: API provider. - ProviderURL: API provider's website. - PorHomePage: API portal/homepage. - Endpoint: API endpoint. - Version: API version. - Type: API type (1-Browser, 2-Product, 3-Standard, 4-System/Embedded, 5-Web/Internet). - ArchStyle: Architectural style (1-Indirect, 2-Native/Browser, 3-Push/Streaming, 4-REST, 5-RPC). - IsDeviceSpec: Whether the API is device-specific (0-False, 1-True). - Scope: API scope (1-Metaservice API, 2-Microservice API, 3-Single Purpose API). - Description: Detailed API description.

4. ApiAddition.csv

Includes API metadata and support information. - ID: API ID. - DocsHomePage: Documentation URL. - TwitterURL: Twitter support URL. - SupEmail: Support email. - Forum: API forum/message boards. - ConsoleURL: Interactive console URL. - TermURL: Terms of service URL. - DescFileURL: API description file URL. - DescFileType: File type (e.g., Swagger, RAML, WSDL). - IsNonPrptry: Whether the API is non-proprietary (0-False, 1-True). - LiceType: License type. - IsSslSup: SSL support (0-False, 1-True). - AuthModel: Authentication model. - ReqFmt: Supported response formats. - IsHyperApi: Hypermedia API flag (0-False, 1-True). - IsRstctAces: Restricted access (0-False, 1-True). - IsUnofficial: Whether it's an unofficial API (0-False, 1-True).

5. ApiCate.csv

Maps APIs to their respective categories. - ApiID: API ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

6. MashupSketch.csv

Stores basic details about mashups. - Name: Mashup name. - PwURL: Mashup URL. - Description: Short description. - CategoryName: Primary category. - CategoryURL: Category URL. - SubmitDate: Submission date.

7. Mashup.csv

Detailed information about mashups. - ID: Unique mashup ID. - Name: Mashup name. - PwURL: Mashup URL. - Company: Company associated with the mashup. - URL: Direct link to the mashup. - Description: Detailed description. - Type: Type (1-Web, 2-Mobile, 3-Desktop, 4-Other).

8. MashupCate.csv

Maps mashups to categories. - MashupID: Mashup ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

9. MashupApi.csv

Maps mashups to the APIs they use. - MashupID: Mashup ID. - ApiID: API ID.

Usage

  • This dataset is ideal for research on API usage trends, category distributions, and mashup compositions.
  • It can be used to study API popularity, analyze technological trends, or build recommendation systems for developers looking for APIs.

Acknowledgments

Data sourced from ProgrammableWeb.

Search
Clear search
Close search
Google apps
Main menu