Facebook
TwitterAnalysis of IP holdings (active patents and trademarks) can shed light on technology and innovation at the corporate level. Insight is achieved from a variety of analyses, for example:
How does the corporate IP portfolio of a given company compare to its competitors?
Who are new entrants in the sector with similar technologies, based on their intellectual property filings?
How has a company's IP filing activity changed over time? Are patents and trademarks being filed into the similar classes as done previously, or into new or different classes, indicating a shift to new products or services, or innovation into potential new areas and technologies.
Coverage includes Intellectual Property registries from the USA, Canada and Europe.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Physical Intellectual Property Market Size 2025-2029
The physical intellectual property market size is valued to increase USD 3.41 billion, at a CAGR of 7.4% from 2024 to 2029. Growing complexity of ICs will drive the physical intellectual property market.
Major Market Trends & Insights
North America dominated the market and accounted for a 51% growth during the forecast period.
By Application - Mobile computing devices segment was valued at USD 2.86 billion in 2023
By End-user - Semiconductor segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 72.35 million
Market Future Opportunities: USD 3411.70 million
CAGR : 7.4%
North America: Largest market in 2023
Market Summary
The market encompasses the licensing, buying, and selling of tangible inventions and creations, primarily focusing on core technologies and applications such as semiconductors, biotechnology, and mechanical designs. With the growing complexity of integrated circuits and the proliferation of wireless technologies, the demand for configurable semiconductor IP continues to surge. Service types or product categories, including patent licensing, patent enforcement, and patent valuation, play a crucial role in this market. Regulatory compliance, particularly in the context of intellectual property laws and international trade agreements, poses challenges for market participants. Looking forward, the market is expected to unfold with significant opportunities, particularly in emerging economies, as they increasingly prioritize innovation and IP protection.
According to recent reports, the patent licensing segment is projected to account for over 60% of the market share, underscoring its dominance in the landscape.
What will be the Size of the Physical Intellectual Property Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Physical Intellectual Property Market Segmented and what are the key trends of market segmentation?
The physical intellectual property industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Application
Mobile computing devices
Consumer electronic devices
Automotive
Industrial automation
Others
End-user
Semiconductor
Manufacturing
IT and telecom
Others
Type
Patents
Licensing
Copyrights
Architectural design rights
Others
Geography
North America
US
Canada
Europe
France
Germany
UK
APAC
Australia
China
Japan
South Korea
Taiwan
Rest of World (ROW)
By Application Insights
The mobile computing devices segment is estimated to witness significant growth during the forecast period.
The market encompasses various aspects, including intellectual property licensing, brand asset valuation, copyright infringement litigation, technology transfer agreements, trademark registration process, ip portfolio optimization, design patent applications, trade secret protection, competitive intelligence gathering, ip asset monetization, knowledge management systems, utility patent prosecution, ip litigation strategies, patent portfolio management, patent landscape analysis, ip enforcement actions, ip valuation methodologies, licensing revenue forecasting, franchise agreements, portfolio diversification strategy, transactional ip law, technology valuation models, ip asset registry, IP risk assessment, technology commercialization, confidentiality agreements, royalty income streams, software license compliance, non-compete clauses, digital rights management, data privacy regulations, and open-source software licensing. In the mobile computing devices segment, the demand for physical intellectual property is on the rise due to the increasing need for higher processing power in mobile and other computing devices.
This trend is fueled by the growing popularity of mobile computing devices such as smartphones, tablets, laptops, and ultra-books. Chinese manufacturers like BBK Electronics, Huawei Technologies, and Xiaomi are leading this segment with their competitively priced devices offering upgraded technologies. The disposable income of consumers in developing countries, particularly India, is another significant factor contributing to the growth of mobile computing devices. Additionally, the increasing internet penetration is playing a crucial role in driving the demand for these devices. According to recent studies, the adoption of mobile computing devices has grown by 18.7%, and it is projected to expand by 25.6% in the coming years.
Request Free Sample
The Mobile computing devices segment was valued at USD 2.86 billion in 2019 and showed a gr
Facebook
TwitterWhy Lighthouse IP data World’s widest IP coverage: 170 patent authorities, 198 trademark authorities, 101 design authorities. See lighthouseip.com.
Get in touch to request an expanded sample bucket!
Structured & enriched: bibliographic core, legal-status events, full-text, citations, valuations, litigation tags.
Fresh: daily or weekly updates direct from gazettes and PTO APIs.
AI-ready: word2vec/BERT vector packs (VaaS) and clean JSON/XML make LLM and semantic-search projects plug-and-play.
Data modules Patents - Diamond File (biblio) - Legal-status feed with 130 offices - Full-text (claims, description, machine-translated) - PDF facsimiles & file wrappers (USPTO)
IP-BI valuation scores & Wart Index risk metrics - Trademarks: 202 million marks, Nice classes, owner normalisation, 190 million images. - Designs: global drawings, titles, status. - Litigation & post-grant: PTAB, EPO oppositions, worldwide court tags. - Vectorisation as a Service: 300-dimensional embeddings for clustering and GPT-style prompting.
Geographic & historical coverage - Patents back to 1836 (US) and first filings per jurisdiction. - Trademarks and designs back to each office’s digitisation cut-off. - Continuous capture for emerging offices ensures no blind spots.
Delivery options Method Highlights Cadence Formats Custom feeds WIPO ST.36-compliant schema Daily / Weekly XML, JSON, CSV Bulk S3 replication Original PDFs & images Weekly PDFs, TIFF, PNG
Filters by date range, region, status and data type keep payload lean.
Typical use cases - Patent analytics dashboards - Freedom-to-operate & competitive intelligence - Brand watch services & conflict detection - LLM fine-tuning, semantic-search, embeddings benchmarking - Portfolio valuation, deal sourcing, investor due diligence
Key benefits in one glance - Comprehensiveness: patents, trademarks, designs under one licence. - Quality: normalised assignees, de-duplicated families, verified legal events. - Speed: new publications live within hours of PTO release. - Scalability: terabytes via cloud or lightweight API endpoints. - Support: schema docs, sample code, dedicated data-engineering team.
Ready to power your next IP insight or AI model? Reach out via info@lighthouseip.com for a trial feed.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Intellectual Property Software Market Size 2025-2029
The intellectual property software market size is valued to increase USD 7.96 billion, at a CAGR of 19.6% from 2024 to 2029. Rise in adoption of intellectual property software to improve efficiency of enterprises will drive the intellectual property software market.
Major Market Trends & Insights
North America dominated the market and accounted for a 38% growth during the forecast period.
By Deployment - On-premises segment was valued at USD 1.97 billion in 2023
By Component - Software segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 428.47 million
Market Future Opportunities: USD 7960.30 million
CAGR : 19.6%
North America: Largest market in 2023
Market Summary
The market encompasses a continually evolving landscape of technologies and applications designed to manage and protect intellectual assets. Core technologies, such as Artificial Intelligence (AI) and Blockchain, are revolutionizing the way intellectual property (IP) is created, registered, and enforced. According to recent studies, the adoption rate of AI in IP management is projected to reach 60% by 2025. However, the market faces challenges, including the lack of strict IP laws in certain regions and the increasing complexity of IP portfolios. Despite these hurdles, opportunities abound, with the global IP software market expected to reach a significant market share in the coming years. This dynamic market is shaped by various factors, including advancements in technology, regulatory changes, and regional trends.
What will be the Size of the Intellectual Property Software Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Intellectual Property Software Market Segmented and what are the key trends of market segmentation?
The intellectual property software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloud-basedComponentSoftwareServiceApplicationLicensingPatent ManagementTrademark ManagementCopyright ManagementContract ManagementOthersEnd-User IndustryLegal FirmsTechnologyMedia & EntertainmentComponent SpecificityCore SoftwareSupport ServicesConsultingIntegration TypeStandaloneIntegrated with ERPAPI-enabledGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKAPACAustraliaChinaIndiaJapanSouth AmericaBrazilRest of World (ROW)
By Deployment Insights
The on-premises segment is estimated to witness significant growth during the forecast period.
Intellectual property software plays a pivotal role in various industries, enabling technology scouting, invention disclosure, and innovation management. According to recent studies, the adoption of intellectual property software in the technology sector has increased by 18.7%, while the healthcare sector has seen a growth of 21.3% in its usage. Digital rights management and infringement detection are critical functions that have witnessed a surge in demand, with a 25.6% and 27.9% increase in adoption, respectively. Moreover, the intellectual property market is evolving, with a shift towards cloud-based solutions. Contract lifecycle management, patent management systems, and knowledge management systems are among the cloud-based intellectual property software categories experiencing significant growth. Cloud-based solutions offer flexibility, ease of use, and cost savings, making them increasingly popular among businesses. Despite this trend, the on-premises segment continues to hold a substantial market share. On-premises intellectual property software is highly secure and customizable, with the BFSI and healthcare sectors being major adopters. However, its market share is projected to decline during the forecast period due to the growing preference for cloud-based solutions. Furthermore, intellectual property software solutions cater to various aspects of IP management, including trademark monitoring, IP risk management, and IP strategy development. These tools help businesses optimize their IP portfolios, protect their brands, and monetize their intellectual assets. Additionally, they facilitate technology commercialization, technology transfer, and IP valuation methods. In conclusion, the market is dynamic and diverse, catering to the evolving needs of businesses across sectors. Its applications range from technology scouting and invention disclosure to IP portfolio optimization and monetization strategies. The market's continuous growth is driven by factors such as the increasing importance of data privacy compliance, due diligence software, and IP rights enforcement.
Request Free Sample
The On-pr
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The sectoral approach:
The sectoral approach is an aggregation of the manufacturing industries according to technological intensity (R&D expenditure/value added) and based on the http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:NACE">Statistical classification of economic activities in the European Community (NACE) at 2-digit level. The level of R&D intensity served as a criterion of classification of economic sectors into high-technology, medium high-technology, medium low-technology and low-technology industries.
Services are mainly aggregated into knowledge-intensive services (KIS) and less knowledge-intensive services (LKIS) based on the share of tertiary educated persons at NACE 2-digit level.
The sectoral approach is used for all indicators except data on high-tech trade and patents.
Note that due to the revision of the NACE from NACE Rev. 1.1 to NACE Rev. 2 the definition of high-technology industries and knowledge-intensive services has changed in 2008. For high-tech statistics it means that two different definitions (one according NACE Rev. 1.1 and one according NACE Rev. 2) are used in parallel and the data according to both NACE versions are presented in separated tables depending on the data availability. For example as the LFS provides the results both by NACE Rev. 1.1 and NACE Rev. 2, all the table using this source have been duplicated to present the results by NACE Rev. 2 from 2008. For more details, see both definitions of high-tech sectors in Annex 2 and 3.
Within the sectoral approach, a second classification was created, named Knowledge Intensive Activities KIA) and based on the share of tertiary educated people in each sectors of industries and services according to NACE at 2-digit level and for all EU Member States. A threshold was applied to judge sectors as knowledge intensive. In contrast to first sectoral approach mixing two methodologies, one for manufacturing industries and one for services, the KIA classification is based on one methodology for all the sectors of industries and services covering even public sector activities.
The aggregations in use are Total Knowledge Intensive Activities (KIA) and Knowledge Intensive Activities in Business Industries (KIABI). Both classifications are made according to NACE Rev. 1.1 and NACE Rev. 2 at 2- digit level. Note that due to revision of the NACE Rev.1.1 to NACE Rev. 2 the list of Knowledge Intensive Activities has changed as well, the two definitions are used in parallel and the data are shown in two separate tables. NACE Rev.2 collection includes data starting from 2008 reference year. For more details please see the definitions in Annex 7 and 8.
The product approach:
The product approach was created to complement the sectoral approach and it is used for data on high-tech trade. The product list is based on the calculations of R&D intensity by groups of products (R&D expenditure/total sales). The groups classified as high-technology products are aggregated on the basis of the Standard International Trade Classification (SITC).
The initial definition was built based on SITC Rev.3 and served to compile the high-tech product aggregates until 2007. With the implementation in 2007 of the new version of SITC Rev.4, the definition of high-tech groups was revised and adapted according to new classification. Starting from 2007 the Eurostat presents the trade data for high-tech groups aggregated based on the SITC Rev.4. For more details, see definition of high-tech products in Annex 4 and 5.
High-tech patents:
High-tech patents are defined according to another approach. The groups classified as high-tech patents are aggregated on the basis of the International Patent Classification (IPC 8th edition).
Biotechnology patents are also aggregated on the basis of the IPC 8th edition. For more details, see the aggregation list of high-tech and biotechnology patents in Annex 6.
The high-tech domain also comprises the sub-domain Venture Capital Investments: data are provided by http://www.investeurope.eu/" target="_self">INVEST Europe (formerly named the European Private Equity and Venture Capital Association EVCA). More details are available in the Eurostat metadata under Venture capital investments.
Please note that for paragraphs where no metadata for regional data has been specified, the regional metadata is identical to the metadata provided for the national data.
Facebook
TwitterYour Gateway to Informed Decision-Making. Explore comprehensive legal information, including Intellectual Property, Patents, Courts, Litigation, Royalty Rates, Trademarks, Attorneys, Legal Parties, and Copyright data. Empower your research, stay ahead of legal trends, and make data-driven decisions
Facebook
TwitterThe Survey on Innovation and Patent Use is an ad-hoc survey administered by the ONS on behalf of the Intellectual Property Office (IPO). The IPO is an Executive Agency of the Department for Business, Energy and Inductrial Strategy (BEIS) and the official government body responsible for Intellectual Property (IP) rights in the United Kingdom. Its mission is to 'promote innovation by providing a clear, accessible and widely understood IP system, which enables the economy and society to benefit from knowledge and ideas'.
The survey design comprises a short telephone based survey to collect information on innovation and intellectual property use in the United Kingdom. Questions asked relate to product and process innovations and patents. Data collection has covered a five - seven week period. The sample consists of businesses that consented to be contacted again after previously taking part in the UK Innovation Survey (UKIS) (available from the UK Data Archive under SN 6699), a survey that is administered by the ONS on behalf of BEIS.
The results of the survey are used for research into the ways that businesses work with patents which will inform the IPO about the ways that the UK system of property rights can be improved to help inventors and innovators.
Linking to other business studies
These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.
Postcode Districts
The postcode district variable is available only for the 2013 data.
The second edition (November 2018) includes data and documentation for 2015.
Facebook
TwitterTrademark data aggregated across multiple Intellectual Property (IP) registries, including USPTO, CIPO, EUIPO and WIPO (USA, Canada, Europe). Full dataset updated weekly, available via customized reports, raw feed, or one-off reports. Full bibliographic data provided for each trademark record; filing date, registration date, NICE classification, Trademark name, type, etc. Ownership/entity relationship mapping, ticker mapping, ISIN mapping, Crunchbase uuid mapping, Crunchbase domain mapping. We also provide our proprietary IP Activity Score for each owner, which can assist to compare recent innovation activity amongst owners, as reflected in their intellectual property filings.
Ipqwery's Trademark dataset is also available as a combined dataset with our Patent dataset, enabling full IP profiles for corporate entities.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Intellectual Property (IP) Management Software market is experiencing robust growth, driven by the increasing need for efficient IP portfolio management, rising digitalization across industries, and the escalating importance of protecting intellectual assets in a competitive global landscape. The market, estimated at $2.5 billion in 2025, is projected to experience a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching approximately $7.2 billion by 2033. Key growth drivers include the rising complexity of IP rights management, the need for enhanced collaboration among internal and external stakeholders, and the demand for streamlined workflows to optimize patent prosecution, licensing, and enforcement processes. The software market caters to a diverse range of clients, from small businesses to multinational corporations across various sectors like pharmaceuticals, technology, and manufacturing. The competitive landscape features a mix of established players like Clarivate, Questel, and Anaqua, and emerging innovative companies offering specialized solutions. Continued advancements in Artificial Intelligence (AI) and machine learning are expected to further fuel market expansion, automating tasks and providing more insightful analytics for strategic IP decision-making. The market segmentation reveals significant opportunities within specific niches. For example, cloud-based solutions are gaining traction due to their scalability, accessibility, and cost-effectiveness. Likewise, specialized modules addressing specific IP asset types, such as trademarks or copyrights, are witnessing strong growth. However, challenges remain, such as the high initial investment costs associated with implementing comprehensive IP management software and the need for ongoing training and support for users. Furthermore, data security and integration with existing systems continue to be important considerations for organizations adopting such solutions. Nevertheless, the long-term outlook for the IP management software market remains exceptionally positive, underpinned by the enduring value of intellectual property and the continuous evolution of technology to manage and leverage it more effectively.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for U.S. Granted Patents: Total Patents Originating in the United Arab Emirates (PATENT4NAETOTAL) from 1992 to 2020 about United Arab Emirates, patent granted, patents, intellectual property, origination, foreign, and USA.
Facebook
Twitter
According to our latest research, the global IP Infringement Detection market size reached USD 1.87 billion in 2024, reflecting robust growth driven by escalating digitalization and the rising importance of intellectual property protection across sectors. The market is expanding at a significant CAGR of 13.2% and is forecasted to attain a value of USD 5.34 billion by 2033. This remarkable growth trajectory is underpinned by increasing incidences of IP theft, the proliferation of online content, and the need for automated, scalable detection systems to safeguard innovations and creative assets.
One of the primary growth factors for the IP Infringement Detection market is the exponential increase in digital content creation and distribution, particularly in sectors such as media & entertainment, IT & telecom, and retail & e-commerce. As organizations and individuals produce and share intellectual property at an unprecedented rate, the risk of unauthorized use or replication has grown considerably. This surge in content has necessitated the deployment of advanced software solutions capable of monitoring, detecting, and responding to infringement activities in real time. Additionally, the growing sophistication of cyber threats and the emergence of new infringement tactics have compelled organizations to invest in comprehensive IP protection frameworks, further fueling market demand.
Another significant driver is the evolving regulatory landscape, which is becoming increasingly stringent regarding intellectual property rights enforcement. Governments worldwide are implementing stricter laws and policies to combat copyright, trademark, and patent violations, thereby pushing enterprises to adopt robust IP infringement detection systems. Furthermore, the global expansion of digital commerce and cross-border transactions has made it imperative for businesses to protect their IP assets in multiple jurisdictions. This regulatory impetus, coupled with the reputational and financial risks associated with IP theft, is prompting organizations across various industries to prioritize investment in advanced detection technologies.
Technological advancements are also playing a pivotal role in shaping the IP Infringement Detection market. The integration of artificial intelligence, machine learning, and big data analytics into detection platforms has significantly enhanced their accuracy, scalability, and efficiency. These technologies enable the automated scanning of vast digital ecosystems, identification of subtle infringement patterns, and provision of actionable insights for legal and compliance teams. The continuous evolution of such technologies not only addresses current challenges but also anticipates future threats, ensuring that organizations remain one step ahead in protecting their intellectual property portfolios.
From a regional perspective, North America continues to dominate the IP Infringement Detection market due to its advanced technological infrastructure, high concentration of IP-intensive industries, and proactive regulatory environment. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitization, increasing R&D investments, and heightened awareness of IP protection among enterprises. Europe, with its strong legal frameworks and focus on innovation, also holds a substantial share, while Latin America and the Middle East & Africa are witnessing steady growth as local industries embrace digital transformation and IP-centric business models.
The Component segment of the IP Infringement Detection market is bifurcated into software and services, each playing a critical role in the overall ecosystem. Software solutions form the backbone of IP infringement detection, leveraging advanced algorithms and data analytics to monitor vast digital landscapes for unauthorized use of intellectual property. These platforms are designed to automate the detection process, reduce manual intervention, and provide r
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data we have uploaded consists of partial sample data from the 'pepper_biomass' dataset. Due to reasons concerning intellectual property and personal privacy, we have chosen not to publicly disclose the complete dataset. However, interested parties may contact the corresponding author of the article to request access to the full dataset, provided that they make a reasonable academic inquiry.
Facebook
TwitterFAIR and open research data is vital for advancing science. But not all data can be openly published, for example to protect intellectual property. Who can publish or reuse data is often governed by agreements or policies. And even if published data is available, it might not be usable for every purpose due to limitations by the license.
Join us in a brief journey through legal aspects that should be considered when using and reusing data and an introduction to licenses that can make sure that your data can be reused exactly in the way you intend it to be.
Stay tuned for more exciting content, and thank you for being a part of our growing community!
Facebook
TwitterCustomer contact data helps support the provision of the corporate data as well as assisting customers with their dealings with IPO. For example contacting customers regarding - acceptance or rejection of trade marks, patents or designs, usage of products and services
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intellectual property is crucial for the development of firms. At the micro level, firm comprehensive intellectual property ability involves abilities about intellectual property creation, utilization, protection, and management. In order to develop the comprehensive intellectual property ability of firms, the China National Intellectual Property Administration began to implement the national intellectual property demonstration advantage firm (NIPDAF) policy in 2013. Based on this exogenous policy shock, using data from listed companies from 2011 to 2020 as the research sample, the time-varying DID method is used to test the impact of the NIPDAF policy intended to cultivate comprehensive intellectual property ability on firm productivity. The results show that after policy implementation, the total factor productivity of NIPDAFs increased by about 3.3% compared to the control group. This finding is robust after a series of tests. Furthermore, the NIPDAF policy promotes firm productivity through stimulating technology innovation, improving investment efficiency, and enhancing competitive advantage. In addition, the NIPDAF policy has a more significant incentive effect on the total factor productivity of non-state-owned enterprises, firms in the eastern region, and firms in patent intensive industries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all “license files” extracted from a snapshot of the Software Heritage archive taken on 2022-04-25. (Other, possibly more recent, versions of the datasets can be found at https://annex.softwareheritage.org/public/dataset/license-blobs/).
In this context, a license file is a unique file content (or “blob”) that appeared in a software origin archived by Software Heritage as a file whose name is often used to ship licenses in software projects. Some name examples are: COPYING, LICENSE, NOTICE, COPYRIGHT, etc. The exact file name pattern used to select the blobs contained in the dataset can be found in the SQL query file 01-select-blobs.sql. Note that the file name was not expected to be at the project root, because project subdirectories can contain different licenses than the top-level one, and we wanted to include those too.
Format
The dataset is organized as follows:
blobs.tar.zst: a Zst-compressed tarball containing deduplicated license blobs, one per file. The tarball contains 6’859’189 blobs, for a total uncompressed size on disk of 66 GiB.
The blobs are organized in a sharded directory structure that contains files named like blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02, where:
blobs/ is the root directory containing all license blobs
8624bcdae55baeef00cd11d5dfcfa60f68710a02 is the SHA1 checksum of a specific license blobs, a copy of the GPL3 license in this case. Each license blob is ultimately named with its SHA1:
$ head -n 3 blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
$ sha1sum blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02
8624bcdae55baeef00cd11d5dfcfa60f68710a02 blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02
86 and 24 are, respectively, the first and second group of two hex digits in the blob SHA1
One blob is missing, because its size (313MB) prevented its inclusion; (it was originally a tarball containing source code):
swh:1:cnt:61bf63793c2ee178733b39f8456a796b72dc8bde,1340d4e2da173c92d432026ecdc54b4859fe9911,"AUTHORS"
blobs-sample20k.tar.zst: analogous to blobs.tar.zst, but containing “only” 20’000 randomly selected license blobs
license-blobs.csv.zst a Zst-compressed CSV index of all the blobs in the dataset. Each line in the index (except the first one, which contains column headers) describes a license blob and is in the format SWHID,SHA1,NAME, for example:
swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2,8624bcdae55baeef00cd11d5dfcfa60f68710a02,"COPYING"
swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2,8624bcdae55baeef00cd11d5dfcfa60f68710a02,"COPYING.GPL3"
swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2,8624bcdae55baeef00cd11d5dfcfa60f68710a02,"COPYING.GLP-3"
where:
SWHID: the Software Heritage persistent identifier of the blob. It can be used to retrieve and cross-reference the license blob via the Software Heritage archive, e.g., at: https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
SHA1: the blob SHA1, that can be used to cross-reference blobs in the blobs/ directory
NAME: a file name given to the license blob in a given software origin. As the same license blob can have different names in different contexts, the index contain multiple entries for the same blob with different names, as it is the case in the example above (yes, one of those has a typo in it, but it’s an original typo from some repository!).
blobs-fileinfo.csv.zst a Zst-compressed CSV mapping from blobs to basic file information in the format: SHA1,MIME_TYPE,ENCODING,LINE_COUNT,WORD_COUNT,SIZE, where:
blobs-scancode.csv.zst a Zst-compressed CSV mapping from blobs to software license detected in them by ScanCode, in the format: SHA1,LICENSE,SCORE, where:
There may be zero or arbitrarily many lines for each blob.
blobs-scancode.ndjson.zst a Zst-compressed line-delimited JSON, containing a superset of the information in blobs-scancode.csv.zst. Each line is a JSON dictionary with three keys:
scancode.api.get_licenses(..., min_score=0)scancode.api.get_copyrights(...)There is exactly one line for each blob. licenses and copyrights keys are omitted for files not detected as plain text.
blobs-origins.csv.zst a Zst-compressed CSV mapping of where license blobs come from. Each line in the index associate a license blob to one of its origins in the format SWHID, for example:
swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2 https://github.com/pombreda/Artemis
Note that a license blob can come from many different places, only an arbitrary (and somewhat random) one is listed in this mapping.
If no origin URL is found in the Software Heritage archive, then a blank is used instead. This happens when they were either being loaded when the dataset was generated, or the loader process crashed before completing the blob’s origin’s ingestion.
blobs-nb-origins.csv.zst a Zst-compressed CSV mapping of how many origins of this blob are known to Software Heritage. Each line in the index associate a license blob to this count in the format SWHID, for example:
swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2 2822260
Two blobs are missing because the computation crashes:
swh:1:cnt:e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
swh:1:cnt:8b137891791fe96927ad78e64b0aad7bded08bdc
This issue will be fixed in a future version of the dataset
blobs-earliest.csv.zst a Zst-compressed CSV mapping from blobs to information about their (earliest) known occurence(s) in the archive. Format: SWHID, where:
replication-package.tar.gz: code and scripts used to produce the dataset
licenses-annotated-sample.tar.gz: ground truth, i.e., manually annotated random sample of license blobs, with details about the kind of information they contain.
Changes since the 2021-03-23 dataset
More input data, due to the SWH archive growing: more origins in supported forges and package managers; and support for more forges and package
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We selected A-share listed companies on the Shanghai and Shenzhen stock exchanges in the period 2015–2022 as the initial sample. We sourced patent data from the State Intellectual Property Office of China and obtained customer concentration and corporate financial data from the CSMAR financial database. We applied the following exclusion criteria: (1) financial sector companies, including banks, securities, and insurance, due to the special nature of the sector’s accounting treatment; (2) companies in digital technology-intensive industries, such as information transmission, software, and information technology services; (3) companies with special treatment status; and (4) companies with missing financial data. We ultimately obtained 19,985 firm-year observations. To mitigate outliers’ influence, continuous variables were winsorized at the 1% level at both tails.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
AEGIS Sample Dataset
This is a sample dataset containing 48 diverse examples from the AEGIS training corpus.
Purpose
This sample demonstrates:
Data format and structure Attack family diversity Preference pair quality Multi-turn conversation patterns
Full Dataset
The complete AEGIS dataset (2,939 preference pairs) is private to protect intellectual property.
Data Format
Each example contains:
id: Unique identifier prompt: Multi-turn conversation… See the full description on the dataset page: https://huggingface.co/datasets/scthornton/aegis-demo-dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background & Summary: The ALL-READY project unites a diverse consortium of Research Infrastructures (RI) and Living Labs, instrumental in developing new methodologies and technologies in agroecology. The project focuses on effective management of innovation, adherence to open science principles, and strategic application of Intellectual Property Rights (IPR). Task 6.4 of the project, which concentrates on Innovation and IPR Management, seeks to understand the dynamics influencing the adoption of these practices among its members. Recognizing the need for end-to-end data management, the project emphasizes standardized data collection and management while adhering to FAIR principles. Methods: The questionnaire was developed by LifeWatch ERIC to capture data reflecting current practices and perceptions in agroecology. It included 26 questions divided into four sections, focusing on existing practices, potential drivers, and barriers in innovation management, open science, and IPR. The survey was disseminated via an online platform to the ALLREADY Pilot Network, ensuring a representative sample from diverse organizations. The data collection process was closely monitored, and the responses were analyzed using a mixed-methods approach to extract meaningful insights. Data Records of the ALLREADY Project Questionnaire: The dataset, collected through an online survey platform, underwent a meticulous process of data preparation, download, formatting, and anonymization. It consists of one text file containing metadata (Readme.txt) and a single CSV file encompassing all questionnaire responses. The dataset provides a comprehensive view of innovation management, open science adoption, and IPR handling within the agroecology sector, particularly among the network of RIs and Living Labs involved in the project. Technical Validation of the ALLREADY Project Questionnaire: Several critical steps were taken to ensure the accuracy, reliability, and overall quality of the data collected. This included development and testing of the questionnaire, rigorous monitoring of the data collection process, and thorough checks for data quality and completeness. The representativeness of the sample was analyzed specifically with respect to the Pilot Network rather than the broader population involved in agroecology. Strategies were employed to counter survey fatigue and maintain respondent engagement. Usage Notes for the ALLREADY Project Questionnaire: The dataset's proper usage is vital for ensuring the validity and reproducibility of research. Researchers are advised to consider the nature of the data, the representativeness of the dataset, and its generalizability. The dataset allows for comprehensive analysis and integration of different sections, and analysts have the flexibility to handle open and write-in responses according to their research needs. Additional information to facilitate analysis is provided in a separate documentation file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data version: 3.3.0
Authors:Bernhard Ganglmair (University of Mannheim, Department of Economics, and ZEW Mannheim)W. Keith Robinson (Wake Forest University, School of Law)Michael Seeligson (Southern Methodist University, Cox School of Business)
Notes on Data Construction2. Citation and Code3. Description of the Data Files3.1. File List3.2. List of Variables for Files with Claim-Level Information3.3. List of Variables for Files with Patent-Level Information4. Coming Soon!
Notes on Data Construction
This is version 3.3.0 of the patccat data (patent claim classification by algorithmic text analysis).
Patent claims define an invention. A patent application is required to have one or more claims that distinctly claim the subject matter which the patent applicant regards as her invention or discovery. We construct a classifier of patent claims that identifies three distinct claim types: process claims, product claims, and product-by-process claims.
For this classification, we combine information obtained from both the preamble and the body of a claim. The preamble is a general description of the invention (e.g., a method, an apparatus, or a device), whereas the body identifies steps and elements (specifying in detail the invention laid out in the preamble) that the applicant is claiming as the invention. The combination of the preamble type and the body type provides us with a more detailed and more accurate classification of claims than other approaches in the literature. This approach also accounts for unconventional drafting approaches. We eventually validate our classification using close to 10,000 manually classified claims.
The data files contain the results of our classification. We provide claim-level information for each independent claim of U.S. utility patents granted between 1836 and 2020. We also provide patent-level information, i.e., the counts of different claim types for a given patent.
For a detailed description of our classification approach, please take a look at the accompanying paper (Ganglmair, Robinson, and Seeligson 2022).
Please cite the following paper when using the data in your own work:
Ganglmair, Bernhard, W. Keith Robinson, and Michael Seeligson (2022): "The Rise of Process Claims: Evidence from a Century of U.S. Patents," unpublished manuscript available at https://papers.ssrn.com/abstract=4069994.
In the paper, we document the use of process claims in the U.S. over the last century, using the patccat data. We show an increase in the annual share of process claims of about 25 percentage points (from below 10% in 1920). This rise in process intensity of patents is not limited to a few patent classes, but we observe it across a broad spectrum of technologies. Process intensity varies by applicant type: companies file more process-intense patents than individuals, and U.S. applicants file more process-intense patents than foreign applicants. We further show that patents with higher process intensity are more valuable but are not necessarily cited more often. Last, process claims are on average shorter than product claims (with the gap narrowing since the 1970s).
We would love to see how other researchers use the data and eventually learn from it. If you have a discussion paper or a publication in which you use the data, please send us a copy at patccat.data@gmail.com.
We will the R code used to construct the data on Github with the next data version (version 3.4.0). Contact us at b.ganglmair@gmail.com if you would like to take a look at an earlier version of the code.
The data files contain claim-level information for independent claims of 10,140,848 U.S. utility patents granted between 1836 and 2020. The files further contain patent-level information for U.S. utility patents.
3.1. File List File list
claims-patccat-v3-3-sample.csv claim-level information for independent claims of a sample of 1000 patents issued between 1976 and 2020
claims-patccat-v3-3-1836-1919.csv claim-level information for independent claims of 1,038,041 patents issued between 1836 and 1919
claims-patccat-v3-3-1920-2020.csv claim-level information for independent claims of 9,102,807 patents issued between 1920 and 2020
patents-patccat-v3-3-sample.csv patent-level information for a sample of 1000 patents issued between 1976 and 2020
patents-patccat-v3-3-1836-1919.csv patent-level information for 1,038,041 patents issued between 1836 and 1919
patents-patccat-v3-3-1920-2020.csv patent-level information for 9,102,807 patents issued between 1920 and 2020
3.2. List of Variables for Files with Claim-Level Information
For detailed descriptions, see the appendix in Ganglmair, Robinson, and Seeligson (2022). List of Variables (Claim-Level Information)
PatentClaim patent claim identifier; 8-digit patent number and 4-digit claim number (Ex: 01234567-0001)
singleLine =1 if claim is published in single-line format
singleReformat outcome code of reformating of single-line claims
Jepson =1 if claim is a Jepson claim
JepsonReformat outcome code of reformating of Jepson claims
inBegin =1 if claim begins with the word "in"
wordsPreamble number of words in the claim preamble
wordsBody number of words in the claim body
dependentClaims number of dependent claims that refer to this independent claim
isMeansPreamble =1 if term "means" is used in the preamble
isMeansBody =1 if term "means" is used in the body
isMeans =1 if term "means" is used anywhere in the claim (~ means-plus-function claim)
processPreamble =1 if terms "method" or "process" are used in the preamble
processBody =1 if terms "method" or "process" are used in the body
processSimple =1 if terms "method" or "process" are used anywhere in the claim (for simple approach of process claim classification)
claimType claim type of full classification (1 = process; 2 = product; 3 = product-by-process; 0 = no type)
preambleType preamble type
preambleTerm keyword used to classify preamble type
preambleTermAlt alternative keyword (if preambleTerm were not used)
preambleTextStub first 15 words of the preamble
bodyType body type
bodyLinesStep number of steps in the body
bodyLinesElement number of elements in the body
bodyLinesTotal total number of identified lines in the body
label 2-character label of the preamble-body combination; classification table maps label to claim type
3.3. List of Variables for Files with Patent-Level Information
For detailed descriptions, see the appendix in Ganglmair, Robinson, and Seeligson (2022). List of Variables (Patent-Level Information)
patent_id U.S. patent number (8-digit patent number)
claims number of independent claims (the sum of the four claim types: 0, 1, 2, and 3)
noCategory number of claims without a classified type
processClaims number of process claims
productClaims number of product claims
prodByProcessClaims number of product-by-process claims
firstClaim type of the first claim (1 = process; 2 = product; 3 = product-by-process; 0 = no type)
simpleProcessClaims number of process claims by simple approach (terms "method" or "process" anywhere in the claim)
simpleProcessPreamble number of process claims by simple approach (terms "method" or "process" in the preamble)
meansClaims number of means-plus-function claims
meansFirst =1 if first claim is a means-plus-function claim
JepsonClaims number of Jepson claims
JepsonFirst =1 if first claim is a Jepson claim
Note: The following variables/fields are currently empty (March 30, 2020); we will populate these variables/fields with data version 3.4.0.
preambleTermpreambleTermAltpreambleTextStubbodyLinesStepbodyLinesElementbodyLinesTotal
Note: We will release the data for patents issued in 2021 with data version 3.4.0.
We are working on a number of extensions of the patccat data.
Facebook
TwitterAnalysis of IP holdings (active patents and trademarks) can shed light on technology and innovation at the corporate level. Insight is achieved from a variety of analyses, for example:
How does the corporate IP portfolio of a given company compare to its competitors?
Who are new entrants in the sector with similar technologies, based on their intellectual property filings?
How has a company's IP filing activity changed over time? Are patents and trademarks being filed into the similar classes as done previously, or into new or different classes, indicating a shift to new products or services, or innovation into potential new areas and technologies.
Coverage includes Intellectual Property registries from the USA, Canada and Europe.