8 datasets found

Data from: Programmable Web
kaggle.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rik (2025). Programmable Web [Dataset]. https://www.kaggle.com/datasets/rimkomatic/programmable-web/data
Explore at:
zip(5490552 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Rik
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ProgrammableWeb Dataset

Overview

This dataset contains structured information about APIs, mashups, and categories from ProgrammableWeb, one of the most comprehensive directories of web APIs. The data has been extracted from a MySQL database and converted into CSV format for easy use in data analysis and machine learning applications.

Dataset Contents

The dataset is composed of multiple CSV files, each representing a different aspect of the API ecosystem:

1. Category.csv

Contains information about API categories. - ID: Unique identifier for the category. - Name: Name of the category. - PwURL: ProgrammableWeb URL for the category. - Amount: Count of APIs in this category (approximate).

2. ApiSketch.csv

Stores basic details about APIs before full data retrieval. - Name: API name. - PwURL: API URL on ProgrammableWeb. - Description: Short API description. - CategoryName: Primary category of the API. - CategoryURL: URL of the category. - SubmitDate: Date the API was submitted.

3. ApiBasic.csv

Contains detailed information about APIs. - ID: Unique API identifier. - Name: API name. - PwURL: API URL. - Provider: API provider. - ProviderURL: API provider's website. - PorHomePage: API portal/homepage. - Endpoint: API endpoint. - Version: API version. - Type: API type (1-Browser, 2-Product, 3-Standard, 4-System/Embedded, 5-Web/Internet). - ArchStyle: Architectural style (1-Indirect, 2-Native/Browser, 3-Push/Streaming, 4-REST, 5-RPC). - IsDeviceSpec: Whether the API is device-specific (0-False, 1-True). - Scope: API scope (1-Metaservice API, 2-Microservice API, 3-Single Purpose API). - Description: Detailed API description.

4. ApiAddition.csv

Includes API metadata and support information. - ID: API ID. - DocsHomePage: Documentation URL. - TwitterURL: Twitter support URL. - SupEmail: Support email. - Forum: API forum/message boards. - ConsoleURL: Interactive console URL. - TermURL: Terms of service URL. - DescFileURL: API description file URL. - DescFileType: File type (e.g., Swagger, RAML, WSDL). - IsNonPrptry: Whether the API is non-proprietary (0-False, 1-True). - LiceType: License type. - IsSslSup: SSL support (0-False, 1-True). - AuthModel: Authentication model. - ReqFmt: Supported response formats. - IsHyperApi: Hypermedia API flag (0-False, 1-True). - IsRstctAces: Restricted access (0-False, 1-True). - IsUnofficial: Whether it's an unofficial API (0-False, 1-True).

5. ApiCate.csv

Maps APIs to their respective categories. - ApiID: API ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

6. MashupSketch.csv

Stores basic details about mashups. - Name: Mashup name. - PwURL: Mashup URL. - Description: Short description. - CategoryName: Primary category. - CategoryURL: Category URL. - SubmitDate: Submission date.

7. Mashup.csv

Detailed information about mashups. - ID: Unique mashup ID. - Name: Mashup name. - PwURL: Mashup URL. - Company: Company associated with the mashup. - URL: Direct link to the mashup. - Description: Detailed description. - Type: Type (1-Web, 2-Mobile, 3-Desktop, 4-Other).

8. MashupCate.csv

Maps mashups to categories. - MashupID: Mashup ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

9. MashupApi.csv

Maps mashups to the APIs they use. - MashupID: Mashup ID. - ApiID: API ID.

Usage

This dataset is ideal for research on API usage trends, category distributions, and mashup compositions.

It can be used to study API popularity, analyze technological trends, or build recommendation systems for developers looking for APIs.

Acknowledgments

Data sourced from ProgrammableWeb.
Worldwide web application and API risk density 2024
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Worldwide web application and API risk density 2024 [Dataset]. https://www.statista.com/statistics/805982/worldwide-application-layer-risk-density/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
World
Description
According to a 2024 study, nearly ** percent of web application and API risks were categorized as low. Only *** percent of web application and API risks were critical in 2024.
d
DataForSEO Labs API for keyword research and search analytics, real-time...
datarade.ai
.json
Updated Jun 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 4, 2021
Dataset authored and provided by
DataForSEO
Area covered
Isle of Man, Korea (Democratic People's Republic of), Armenia, Cocos (Keeling) Islands, Morocco, Mauritania, Micronesia (Federated States of), Kenya, Tokelau, Azerbaijan
Description
DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

You will find well-rounded ways to scout the competitors:

• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.
Developer Doc URLs vs Non-Developer URLs Dataset
kaggle.com
zip
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Mohammed Faham (2025). Developer Doc URLs vs Non-Developer URLs Dataset [Dataset]. https://www.kaggle.com/datasets/iamfaham/developer-doc-urls-vs-non-developer-urls-dataset
Explore at:
zip(96516 bytes)Available download formats
Dataset updated
Jul 23, 2025
Authors
Syed Mohammed Faham
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This dataset contains 2,000 labeled URLs enriched with structural and content-based metadata, curated for binary classification tasks. Each URL is labeled as either a developer documentation link (1) or a non-developer link (0).

The goal is to help machine learning practitioners build robust models that can distinguish between technical documentation and general web content. This dataset is especially useful for building developer-facing search, filtering, and recommendation systems.

Columns

url_dataset_enhanced.csv

url: The full web URL.

url_length: Length of the URL in characters.

path_depth: Number of segments in the URL path.

has_query: 1 if the URL contains a query string (?), else 0.

has_keyword: 1 if the URL contains common dev-related keywords (e.g., docs, api, reference), else 0.

has_docs_subdomain: 1 if the subdomain includes 'docs' (e.g., docs.python.org), else 0.

code_count: Number of code snippets extracted from the page.

tech_keyword_count: Count of tech-related terms found in the page.

content_length: Length of extracted HTML/text content (in characters).

has_table: 1 if a <table> tag is present in the content, else 0.

is_dev_docs:

1 for developer documentation (e.g., SDK docs, API references).

0 for non-developer pages (e.g., blog posts, landing pages).

url_dataset.csv

url: The full web URL.

is_dev_docs:

1 for developer documentation (e.g., SDK docs, API references).

0 for non-developer pages (e.g., blog posts, landing pages).

Key Statistics

Total entries: 2,000

Developer documentation (1): 1,000

Non-developer links (0): 1,000

Potential Use Cases

Train binary classifiers using traditional ML or LLM embeddings.

Develop document filtering systems for technical web crawlers.

Benchmark classification performance on real-world noisy data.

Study web structure and content heuristics useful for link classification.

Build domain-specific data pipelines or RAG systems that focus on technical sources.

Format

Filename: url_dataset_enhanced.csv & url_dataset.csv

Rows: 2,000

Encoding: UTF-8

File type: CSV with headers

Acknowledgements

This dataset was manually curated, preprocessed, and labeled from scratch. It is released for public research and educational use.

License

This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0).
c
Protein Structural Domain Classification
cathdb.info
ec.i4cologne.com
+3more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
Explore at:
Unique identifier
https://identifiers.org/MIR:00100005
Dataset updated
Sep 30, 2024
Description
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
b
ClassyFire
bioregistry.io
Updated Apr 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ClassyFire [Dataset]. https://bioregistry.io/classyfire
Explore at:
Dataset updated
Apr 29, 2021
Description
ClassyFire is a web-based application for automated structural classification of chemical entities. This application uses a rule-based approach that relies on a comprehensible, comprehensive, and computable chemical taxonomy. ClassyFire provides a hierarchical chemical classification of chemical entities (mostly small molecules and short peptide sequences), as well as a structure-based textual description, based on a chemical taxonomy named ChemOnt, which covers 4825 chemical classes of organic and inorganic compounds. Moreover, ClassyFire allows for text-based search via its web interface. It can be accessed via the web interface or via the ClassyFire API.
Spotify Tracks Popularity
kaggle.com
zip
Updated Sep 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lynnxxx (2025). Spotify Tracks Popularity [Dataset]. https://www.kaggle.com/datasets/lynnxxx/spotify-tracks-popularity-classification
Explore at:
zip(6144231 bytes)Available download formats
Dataset updated
Sep 17, 2025
Authors
Lynnxxx
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context Spotify for Developers offers a wide range of possibilities to utilize the extensive catalog of Spotify data. One of them are the audio features calculated for each song and made available via the official Spotify Web API. This dataset contains 9,460 Spotify tracks with comprehensive audio features and metadata, specifically curated for music popularity classification and machine learning projects. The data has been filtered and processed to ensure high quality and completeness for analysis purposes.

Content Each track (row) contains 28 features including: Track Information: Artist name, track name, track ID, release date, and popularity score Audio Features: Danceability, energy, valence, acousticness, instrumentalness, liveness, speechiness, tempo, and loudness Technical Metadata: Musical key, mode, time signature, duration, and Spotify API references Additional Data: Genres, lyrics, preview URLs, and playlist information The popularity feature (0-100 scale) serves as the primary target variable for classification tasks.

Acknowledgements Credit goes entirely to Spotify for providing this data via their Web API. The audio features are calculated by Spotify's proprietary algorithms and represent the most comprehensive music analysis data available. Reference: https://developer.spotify.com/documentation/web-api
indoor plants data set from api
kaggle.com
zip
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abhishek (2023). indoor plants data set from api [Dataset]. https://www.kaggle.com/datasets/iottech/plant
Explore at:
zip(5870 bytes)Available download formats
Dataset updated
Mar 2, 2023
Authors
abhishek
Description
this is the dataset of the indoor plants containing the following fetures:- 1) common name:- it is the name of the plant in the local region 2)family:- this column tells about the family of the plant 3)categories:- category of family origin:-the origin of the plant where the plant was found or grown first. climate:- a climate which is suitable for that plant zone:- (latitudes and longitudes) img_url:- an image URL of the plant.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rik (2025). Programmable Web [Dataset]. https://www.kaggle.com/datasets/rimkomatic/programmable-web/data

Data from: Programmable Web

API Data for Research & Analysis - CSV Dataset from ProgrammableWeb

Explore at:

zip(5490552 bytes)Available download formats

Dataset updated

Mar 26, 2025

Authors

Rik

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

ProgrammableWeb Dataset

Overview

This dataset contains structured information about APIs, mashups, and categories from ProgrammableWeb, one of the most comprehensive directories of web APIs. The data has been extracted from a MySQL database and converted into CSV format for easy use in data analysis and machine learning applications.

Dataset Contents

The dataset is composed of multiple CSV files, each representing a different aspect of the API ecosystem:

1. `Category.csv`

Contains information about API categories. - ID: Unique identifier for the category. - Name: Name of the category. - PwURL: ProgrammableWeb URL for the category. - Amount: Count of APIs in this category (approximate).

2. `ApiSketch.csv`

Stores basic details about APIs before full data retrieval. - Name: API name. - PwURL: API URL on ProgrammableWeb. - Description: Short API description. - CategoryName: Primary category of the API. - CategoryURL: URL of the category. - SubmitDate: Date the API was submitted.

3. `ApiBasic.csv`

Contains detailed information about APIs. - ID: Unique API identifier. - Name: API name. - PwURL: API URL. - Provider: API provider. - ProviderURL: API provider's website. - PorHomePage: API portal/homepage. - Endpoint: API endpoint. - Version: API version. - Type: API type (1-Browser, 2-Product, 3-Standard, 4-System/Embedded, 5-Web/Internet). - ArchStyle: Architectural style (1-Indirect, 2-Native/Browser, 3-Push/Streaming, 4-REST, 5-RPC). - IsDeviceSpec: Whether the API is device-specific (0-False, 1-True). - Scope: API scope (1-Metaservice API, 2-Microservice API, 3-Single Purpose API). - Description: Detailed API description.

4. `ApiAddition.csv`

Includes API metadata and support information. - ID: API ID. - DocsHomePage: Documentation URL. - TwitterURL: Twitter support URL. - SupEmail: Support email. - Forum: API forum/message boards. - ConsoleURL: Interactive console URL. - TermURL: Terms of service URL. - DescFileURL: API description file URL. - DescFileType: File type (e.g., Swagger, RAML, WSDL). - IsNonPrptry: Whether the API is non-proprietary (0-False, 1-True). - LiceType: License type. - IsSslSup: SSL support (0-False, 1-True). - AuthModel: Authentication model. - ReqFmt: Supported response formats. - IsHyperApi: Hypermedia API flag (0-False, 1-True). - IsRstctAces: Restricted access (0-False, 1-True). - IsUnofficial: Whether it's an unofficial API (0-False, 1-True).

5. `ApiCate.csv`

Maps APIs to their respective categories. - ApiID: API ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

6. `MashupSketch.csv`

Stores basic details about mashups. - Name: Mashup name. - PwURL: Mashup URL. - Description: Short description. - CategoryName: Primary category. - CategoryURL: Category URL. - SubmitDate: Submission date.

7. `Mashup.csv`

Detailed information about mashups. - ID: Unique mashup ID. - Name: Mashup name. - PwURL: Mashup URL. - Company: Company associated with the mashup. - URL: Direct link to the mashup. - Description: Detailed description. - Type: Type (1-Web, 2-Mobile, 3-Desktop, 4-Other).

8. `MashupCate.csv`

Maps mashups to categories. - MashupID: Mashup ID. - CateID: Category ID. - IsPri: Whether it’s the primary category (0-False, 1-True).

9. `MashupApi.csv`

Maps mashups to the APIs they use. - MashupID: Mashup ID. - ApiID: API ID.

Usage

This dataset is ideal for research on API usage trends, category distributions, and mashup compositions.
It can be used to study API popularity, analyze technological trends, or build recommendation systems for developers looking for APIs.

Acknowledgments

Data sourced from ProgrammableWeb.

Clear search

Close search

Google apps

Main menu

Data from: Programmable Web

ProgrammableWeb Dataset

Overview

Dataset Contents

1. Category.csv

2. ApiSketch.csv

3. ApiBasic.csv

4. ApiAddition.csv

5. ApiCate.csv

6. MashupSketch.csv

7. Mashup.csv

8. MashupCate.csv

9. MashupApi.csv

Usage

Acknowledgments

Worldwide web application and API risk density 2024

DataForSEO Labs API for keyword research and search analytics, real-time...

Developer Doc URLs vs Non-Developer URLs Dataset

Overview

Columns

url_dataset_enhanced.csv

url_dataset.csv

Key Statistics

Potential Use Cases

Format

Acknowledgements

License

Protein Structural Domain Classification

ClassyFire

Spotify Tracks Popularity

indoor plants data set from api

Data from: Programmable Web

API Data for Research & Analysis - CSV Dataset from ProgrammableWeb

ProgrammableWeb Dataset

Overview

Dataset Contents

1. Category.csv

2. ApiSketch.csv

3. ApiBasic.csv

4. ApiAddition.csv

5. ApiCate.csv

6. MashupSketch.csv

7. Mashup.csv

8. MashupCate.csv

9. MashupApi.csv

Usage

Acknowledgments

1. `Category.csv`

2. `ApiSketch.csv`

3. `ApiBasic.csv`

4. `ApiAddition.csv`

5. `ApiCate.csv`

6. `MashupSketch.csv`

7. `Mashup.csv`

8. `MashupCate.csv`

9. `MashupApi.csv`

1. `Category.csv`

2. `ApiSketch.csv`

3. `ApiBasic.csv`

4. `ApiAddition.csv`

5. `ApiCate.csv`

6. `MashupSketch.csv`

7. `Mashup.csv`

8. `MashupCate.csv`

9. `MashupApi.csv`