6 datasets found
  1. DBLP-Scholar

    • kaggle.com
    zip
    Updated Apr 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mostafa Massoud (2022). DBLP-Scholar [Dataset]. https://www.kaggle.com/datasets/mostafafathy4869/dblpscholar/suggestions
    Explore at:
    zip(4211634 bytes)Available download formats
    Dataset updated
    Apr 19, 2022
    Authors
    Mostafa Massoud
    Description

    Datasets for Binary Entity Resolution

    Source Page : DBLP-Source

    In the VLDB 2010 paper [1] we present a first comparative evaluation on the relative match quality and runtime efficiency of entity resolution approaches using challenging real-world match tasks. The evaluation considers existing approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community a state-of-the-art commercial entity resolution implementation is considered. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.

    The dataset consists of 3 tables:

    Two lists of academic publications: DBLP and Scholar. 1. DBLP1.csv: Contain no redundant 2. Scholar.csv: Contain messy data with redundant entities. 3. DBLP-Scholar_PerfectMapping.csv: The perfect mapping for entities between both tables.

    Workflow:

    Provide an approach to find the perfect mapping between entities from the DBLP1 dataset and Scholar dataset to find same documents from DBLP dataset that is in Scholar dataset or duplicated in the Scholar

  2. Cora Dataset

    • search.gesis.org
    Updated Oct 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramezani, Mahin (2021). Cora Dataset [Dataset]. http://doi.org/10.3886/E109167V2-11132
    Explore at:
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    GESIS search
    Authors
    Ramezani, Mahin
    License

    https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de675664https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de675664

    Description

    Abstract (en): The Cora data contains bibliographic records of machine learning papers that have been manually clustered into groups that refer to the same publication. Originally, Cora was prepared by Andrew McCallum, and his versions of this data set are available on his Data web page. The data is also hosted here. Note that various versions of the Cora data set have been used by many publications in record linkage and entity resolution over the years.

  3. G

    Named Entity Linking AI Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Named Entity Linking AI Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/named-entity-linking-ai-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Named Entity Linking AI Market Outlook




    According to our latest research, the global Named Entity Linking AI market size in 2024 stands at USD 1.42 billion, demonstrating robust momentum driven by the proliferation of AI-powered data analytics and natural language processing technologies. The market is forecasted to reach USD 7.98 billion by 2033, expanding at a remarkable CAGR of 21.2% during the period from 2025 to 2033. This significant growth is primarily propelled by the escalating adoption of AI for automating information extraction and enhancing digital content understanding across various industries.




    The surge in demand for advanced natural language processing (NLP) solutions is a major growth driver for the Named Entity Linking AI market. As organizations accumulate vast volumes of unstructured data from multiple digital channels, the need for automated tools to identify, disambiguate, and link entities within text has become critical. Named Entity Linking (NEL) AI solutions enable businesses to extract actionable insights from text, improve search relevance, and enhance customer experiences. Sectors such as BFSI, healthcare, and e-commerce are increasingly leveraging NEL AI to streamline compliance, personalize content, and automate document processing, which is fueling widespread adoption.




    Another pivotal growth factor is the integration of Named Entity Linking AI into knowledge graph construction and content recommendation systems. Enterprises are investing heavily in AI-driven knowledge management tools to organize and contextualize data, making information retrieval more efficient. NEL AI plays a crucial role in building and maintaining knowledge graphs by accurately linking entities to real-world concepts and databases. This capability is invaluable for applications ranging from enterprise search and digital assistants to fraud detection and sentiment analysis. The growing focus on digital transformation and intelligent automation is expected to further accelerate the deployment of NEL AI solutions across diverse verticals.




    The continuous advancements in machine learning algorithms and the increasing availability of high-quality annotated datasets have significantly enhanced the accuracy and scalability of Named Entity Linking AI. Vendors are developing more sophisticated models capable of handling multilingual data, domain-specific jargon, and context-sensitive entity resolution. The expansion of cloud computing has also democratized access to powerful NEL AI tools, enabling even small and medium enterprises to implement these solutions without substantial upfront investments. As regulatory and ethical considerations around data privacy and AI transparency become more prominent, vendors are also focusing on explainable AI and secure deployment practices, further boosting market confidence and adoption.




    From a regional perspective, North America currently dominates the Named Entity Linking AI market, accounting for the largest share due to the early adoption of AI technologies and the presence of leading NLP research institutions and tech companies. However, the Asia Pacific region is witnessing the fastest growth, driven by the rapid digitization of enterprises, government initiatives promoting AI innovation, and the expanding e-commerce and fintech sectors. Europe is also a significant market, with strong investments in AI research and a growing emphasis on data-driven decision-making in both public and private sectors. Latin America and the Middle East & Africa, while still nascent, are expected to offer lucrative opportunities as digital transformation initiatives gain traction in these regions.



    Ontology Management AI is increasingly becoming a vital component in the realm of Named Entity Linking AI, as it provides a structured framework for organizing and managing complex data relationships. By integrating Ontology Management AI, organizations can enhance their ability to interpret and contextualize data, leading to more accurate entity linking and improved knowledge graph construction. This integration supports the seamless alignment of data across diverse domains, facilitating better decision-making and strategic insights. As businesses continue to embrace digital transformation, the synergy between Ontology Management AI and Named Ent

  4. Self-contained ground-truths for cross-domain linkage

    • figshare.com
    zip
    Updated Apr 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayank Kejriwal (2016). Self-contained ground-truths for cross-domain linkage [Dataset]. http://doi.org/10.6084/m9.figshare.3204325.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mayank Kejriwal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cross-domain knowledge bases such as DBpedia, Freebase and YAGO have emerged as encyclopedic hubs in the Web of Linked Data. Despite enabling several practical applications in the Semantic Web, the large-scale, schema-free nature of such graphs often precludes research groups from employing them widely as evaluation test cases for entity resolution and instance-based ontology alignment applications. Although the ground-truth linkages between the three knowledge bases above are available, they are not amenable to resource-limited applications. One reason is that the ground-truth files are not self-contained, meaning that a researcher must usually perform a series of expensive joins (typically in MapReduce) to obtain usable information sets. We constructed this resource by uploading several publicly licensed data resources to the public cloud and used simple Hadoop clusters to compile, and make accessible, three cross-domain self-contained test cases involving linked instances from DBpedia, Freebase and YAGO. Self-containment is enabled by virtue of a simple NoSQL JSON-like serialization format. Potential applications for these resources, particularly related to testing transfer learning research hypotheses, are described in more detail in a paper submission in the resource track at ISWC 2016.

  5. d

    Asia Pacific B2C Consumer Contact Lookup - Privacy-Compliant Identity...

    • datarade.ai
    .csv, .xls
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    eGentic (2025). Asia Pacific B2C Consumer Contact Lookup - Privacy-Compliant Identity Resolution [Dataset]. https://datarade.ai/data-products/asia-pacific-b2c-consumer-contact-lookup-privacy-compliant-egentic
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    eGentic
    Area covered
    Asia, Australia, New Zealand, Philippines, Hong Kong, Indonesia, Singapore, Malaysia, Thailand, South Africa, Taiwan
    Description

    Key Features: • Matches emails, phone numbers, and names to consumer profiles • Appends additional contact fields and demographic attributes (where available) • Built on permission-based, privacy-compliant global data sources • High match rates for reliable identity resolution

    What You Can Match & Append: • Full Name • Email Address • Phone Number • Physical Address (City, Zipcode, Country - based on availability)

    Use Cases: • Customer record enrichment • Identity resolution and deduplication • Fraud prevention and validation

    Data Format: Emails, Phone Numbers, or Mixed Identifier Inputs

    Data Delivery: SFTP

    Perfect For: • Identity & Fraud Solutions • Data Brokers & Enrichment Providers • Customer Intelligence & Insights Teams

  6. i

    Decentralized Municipal Entities v1.0 - January 2025

    • catalegs.ide.cat
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Decentralized Municipal Entities v1.0 - January 2025 [Dataset]. https://catalegs.ide.cat/geonetwork/sidl/search?format=DWG
    Explore at:
    Dataset updated
    Mar 27, 2025
    Description

    Geographic base of the decentralized municipal entities (EMD) of Catalonia, with their names and codes, derived from the delimitation files of the entities and the municipalities to which they belong, and from reference cartographic sources. Of the boundaries of the polygons of the EMD represented in this base, only those that have a recognition act, a resolution published in the DOGC or a judicial resolution that determines the current official line are definitive, as long as they have coordinates in the current reference system. The rest of the boundaries must be considered provisional. The EMD are a type of entity with a territorial scope lower than the municipality, that is, local governments below the municipalities.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mostafa Massoud (2022). DBLP-Scholar [Dataset]. https://www.kaggle.com/datasets/mostafafathy4869/dblpscholar/suggestions
Organization logo

DBLP-Scholar

Explore at:
zip(4211634 bytes)Available download formats
Dataset updated
Apr 19, 2022
Authors
Mostafa Massoud
Description

Datasets for Binary Entity Resolution

Source Page : DBLP-Source

In the VLDB 2010 paper [1] we present a first comparative evaluation on the relative match quality and runtime efficiency of entity resolution approaches using challenging real-world match tasks. The evaluation considers existing approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community a state-of-the-art commercial entity resolution implementation is considered. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.

The dataset consists of 3 tables:

Two lists of academic publications: DBLP and Scholar. 1. DBLP1.csv: Contain no redundant 2. Scholar.csv: Contain messy data with redundant entities. 3. DBLP-Scholar_PerfectMapping.csv: The perfect mapping for entities between both tables.

Workflow:

Provide an approach to find the perfect mapping between entities from the DBLP1 dataset and Scholar dataset to find same documents from DBLP dataset that is in Scholar dataset or duplicated in the Scholar

Search
Clear search
Close search
Google apps
Main menu