100+ datasets found
  1. Z

    Data from: A Large-scale Dataset of (Open Source) License Text Variants

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset authored and provided by
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

    For more details see the included README file and companion paper:

    Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

  2. d

    Department of Licensing Professional License Counts

    • catalog.data.gov
    • data.wa.gov
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2025). Department of Licensing Professional License Counts [Dataset]. https://catalog.data.gov/dataset/active-professional-licenses-with-the-department-of-licensing
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    data.wa.gov
    Description

    This is a point-in-time count of active professional licenses, by County and State, issued by the Department of Licensing. These licenses are issued to people or businesses.

  3. d

    B2B Contact Data | 148MM+ High Quality US B2B Contacts | LinkedIn URL,...

    • datarade.ai
    .json, .csv, .xls
    Updated Jun 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salutary Data (2023). B2B Contact Data | 148MM+ High Quality US B2B Contacts | LinkedIn URL, Mobile Phone, Email Address, Current Job Title + More [Dataset]. https://datarade.ai/data-products/salutary-data-b2b-contact-data-62m-high-quality-us-b2b-c-salutary-data
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset authored and provided by
    Salutary Data
    Area covered
    United States
    Description

    Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contact ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.

    We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.

    What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.

    Products: API Suite Web UI Full and Custom Data Feeds

    Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contact to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.

  4. C

    Dashboard All Selected Codes

    • data.cityofchicago.org
    application/rdfxml +5
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Dashboard All Selected Codes [Dataset]. https://data.cityofchicago.org/widgets/n4tc-syv8
    Explore at:
    xml, csv, tsv, json, application/rdfxml, application/rssxmlAvailable download formats
    Dataset updated
    Jun 21, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  5. Z

    Data from: The Software Heritage License Dataset (2022 Edition)

    • data.niaid.nih.gov
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Montes-Leon (2024). The Software Heritage License Dataset (2022 Edition) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8200351
    Explore at:
    Dataset updated
    Jan 10, 2024
    Dataset provided by
    Jesus M. Gonzalez-Barahona
    Sergio Montes-Leon
    Gregorio Robles
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all “license files” extracted from a snapshot of the Software Heritage archive taken on 2022-04-25. (Other, possibly more recent, versions of the datasets can be found at https://annex.softwareheritage.org/public/dataset/license-blobs/).

    In this context, a license file is a unique file content (or “blob”) that appeared in a software origin archived by Software Heritage as a file whose name is often used to ship licenses in software projects. Some name examples are: COPYING, LICENSE, NOTICE, COPYRIGHT, etc. The exact file name pattern used to select the blobs contained in the dataset can be found in the SQL query file 01-select-blobs.sql. Note that the file name was not expected to be at the project root, because project subdirectories can contain different licenses than the top-level one, and we wanted to include those too.

    Format

    The dataset is organized as follows:

    blobs.tar.zst: a Zst-compressed tarball containing deduplicated license blobs, one per file. The tarball contains 6’859’189 blobs, for a total uncompressed size on disk of 66 GiB.

    The blobs are organized in a sharded directory structure that contains files named like blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02, where:

    blobs/ is the root directory containing all license blobs

    8624bcdae55baeef00cd11d5dfcfa60f68710a02 is the SHA1 checksum of a specific license blobs, a copy of the GPL3 license in this case. Each license blob is ultimately named with its SHA1:

    $ head -n 3 blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007

    $ sha1sum blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02 8624bcdae55baeef00cd11d5dfcfa60f68710a02 blobs/86/24/8624bcdae55baeef00cd11d5dfcfa60f68710a02

    86 and 24 are, respectively, the first and second group of two hex digits in the blob SHA1

    One blob is missing, because its size (313MB) prevented its inclusion; (it was originally a tarball containing source code):

    swh:1:cnt:61bf63793c2ee178733b39f8456a796b72dc8bde,1340d4e2da173c92d432026ecdc54b4859fe9911,"AUTHORS"

    blobs-sample20k.tar.zst: analogous to blobs.tar.zst, but containing “only” 20’000 randomly selected license blobs

    license-blobs.csv.zst a Zst-compressed CSV index of all the blobs in the dataset. Each line in the index (except the first one, which contains column headers) describes a license blob and is in the format SWHID,SHA1,NAME, for example:

    swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2,8624bcdae55baeef00cd11d5dfcfa60f68710a02,"COPYING" swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2,8624bcdae55baeef00cd11d5dfcfa60f68710a02,"COPYING.GPL3" swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2,8624bcdae55baeef00cd11d5dfcfa60f68710a02,"COPYING.GLP-3"

    where:

    SWHID: the Software Heritage persistent identifier of the blob. It can be used to retrieve and cross-reference the license blob via the Software Heritage archive, e.g., at: https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2

    SHA1: the blob SHA1, that can be used to cross-reference blobs in the blobs/ directory

    NAME: a file name given to the license blob in a given software origin. As the same license blob can have different names in different contexts, the index contain multiple entries for the same blob with different names, as it is the case in the example above (yes, one of those has a typo in it, but it’s an original typo from some repository!).

    blobs-fileinfo.csv.zst a Zst-compressed CSV mapping from blobs to basic file information in the format: SHA1,MIME_TYPE,ENCODING,LINE_COUNT,WORD_COUNT,SIZE, where:

    SHA1: blob SHA1

    MIME_TYPE: blob MIME type, as detected by libmagic

    ENCODING: blob character encoding, as detected by libmagic

    LINE_COUNT: number of lines in the blob (only for textual blobs with UTF8 encoding)

    WORD_COUNT: number of words in the blob (only for textual blobs with UTF8 encoding)

    SIZE: blob size in bytes

    blobs-scancode.csv.zst a Zst-compressed CSV mapping from blobs to software license detected in them by ScanCode, in the format: SHA1,LICENSE,SCORE, where:

    SHA1: blob SHA1

    LICENSE: license detected in the blob, as an SPDX identifier (or ScanCode identifier for non-SPDX-indexed licenses)

    SCORE: confidence score in the result, as a decimal number between 0 and 100

    There may be zero or arbitrarily many lines for each blob.

    blobs-scancode.ndjson.zst a Zst-compressed line-delimited JSON, containing a superset of the information in blobs-scancode.csv.zst. Each line is a JSON dictionary with three keys:

    sha1: blob SHA1

    licenses: output of scancode.api.get_licenses(..., min_score=0)

    copyrights: output of scancode.api.get_copyrights(...)

    There is exactly one line for each blob. licenses and copyrights keys are omitted for files not detected as plain text.

    blobs-origins.csv.zst a Zst-compressed CSV mapping of where license blobs come from. Each line in the index associate a license blob to one of its origins in the format SWHIDURL, for example:

    swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2 https://github.com/pombreda/Artemis

    Note that a license blob can come from many different places, only an arbitrary (and somewhat random) one is listed in this mapping.

    If no origin URL is found in the Software Heritage archive, then a blank is used instead. This happens when they were either being loaded when the dataset was generated, or the loader process crashed before completing the blob’s origin’s ingestion.

    blobs-nb-origins.csv.zst a Zst-compressed CSV mapping of how many origins of this blob are known to Software Heritage. Each line in the index associate a license blob to this count in the format SWHIDNUMBER, for example:

    swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2 2822260

    Two blobs are missing because the computation crashes:

    swh:1:cnt:e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 swh:1:cnt:8b137891791fe96927ad78e64b0aad7bded08bdc

    This issue will be fixed in a future version of the dataset

    blobs-earliest.csv.zst a Zst-compressed CSV mapping from blobs to information about their (earliest) known occurence(s) in the archive. Format: SWHIDEARLIEST_SWHIDEARLIEST_TSOCCURRENCES, where:

    SWHID: blob SWHID

    EARLIEST_SWHID: SWHID of the earliest known commit containing the blob

    EARLIEST_TS: timestamp of the earliest known commit containing the blob, as a Unix time integer

    OCCURRENCES: number of known commits containing the blob

    replication-package.tar.gz: code and scripts used to produce the dataset

    licenses-annotated-sample.tar.gz: ground truth, i.e., manually annotated random sample of license blobs, with details about the kind of information they contain.

    Changes since the 2021-03-23 dataset

    More input data, due to the SWH archive growing: more origins in supported forges and package managers; and support for more forges and package managers. See the SWH Archive Changelog for details.

    Values in the NAME column of license-blobs.csv.zst are quoted, as some file names now contain commas.

    Replication package now contains all the steps needed to reproduce all artefacts including the licenseblobs/fetch.py script.

    blobs-nb-origins.csv.zst is added.

    blobs-origins.csv.zst is now generated using the first origin returned by swh-graph’s leaves endpoint, instead of its randomwalk endpoint. This should have no impact on the result, other than a different distribution of “random” origins being picked.

    blobs-origins.csv.zst was missing ~10% of its results in previous versions of the dataset, due to errors and/or timeouts in its generation, this is now down to 0.02% (1254 of the 6859445 unique blobs). Blobs with no known origins are now present, with a blank instead of URL.

    blobs-earliest.csv.zst was missing ~10% of its results in previous versions of the dataset. It is complete now.

    blobs-scancode.csv.zst is generated with a newer scancode-toolkit version (31.2.1)

    blobs-scancode.ndjson.zst is added.

    Errata

    A file name .tmp_1340d4e2da173c92d432026ecdc54b4859fe9911 was present in the initial version of the dataset (published on 2022-11-07). It was removed on 2022-11-09 using these two commands:

    pv blobs-fileinfo.csv.zst | zstdcat | grep -v ".tmp" | zstd -19 pv blobs.tar.zst| zstdcat | tar --delete blobs/13/40/.tmp_1340d4e2da173c92d432026ecdc54b4859fe9911 | zstd -19 -T12

    The total uncompressed size was announced as 84 GiB based on the physical size on ext4, but it is actually 66 GiB.

    Citation

    If you use this dataset for research purposes, please acknowledge its use by citing one or both of the following papers:

    [pdf, bib] Jesús M. González-Barahona, Sergio Raúl Montes León, Gregorio Robles, Stefano Zacchiroli. The software heritage license dataset (2022 edition). Empirical Software Engineering, Volume 28, Number 6, Article number 147 (2023).

    [pdf, bib] Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    References

    The dataset has been built using primarily the data sources described in the following papers:

    [pdf, bib] Roberto Di Cosmo, Stefano Zacchiroli. Software Heritage: Why and How to Preserve Software Source Code. In Proceedings of iPRES 2017: 14th International Conference on Digital Preservation, Kyoto, Japan, 25-29 September 2017.

    [pdf, bib] Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli. The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019.

    Errata (v2, 2024-01-09)

    licenses-annotated-sample.tar.gz: some comments not intended for publication were removed, and 4

  6. d

    B2B Live Contact Data | 23M+ High Quality US B2B Contacts

    • datarade.ai
    .csv, .xls, .txt
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    1 Stop Data (2023). B2B Live Contact Data | 23M+ High Quality US B2B Contacts [Dataset]. https://datarade.ai/data-products/b2b-live-contact-data-23m-high-quality-us-b2b-contacts-1-stop-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset authored and provided by
    1 Stop Data
    Area covered
    United States
    Description

    From our comprehensive US Data Lake, we proudly present 23M+ high-quality US decision-makers and influencers.

    Take your ABM strategy to the next level, build a strong pipeline and close deals by laser targeting key decision-makers and influencers based on their department, job functions, job responsibilities, interest areas and expertise, then utilise essential prospect information, including verified work email addresses and business phone and social links.

    Our data is sourced directly from executives, businesses, official sources and registries, standardised, de-duped, and verified, and then processed through vigorous compliance procedures for GDPR/PECR on a legitimate interest basis and RTBI etc. This results in a highly accurate single source of quality and compliant B2B data.

    It is with our B2B Live Data Lake that we can enrich your CRM data, supply new prospect data, verify leads, and provide you with a custom dataset tailored to your target audience specifications. We also cater for big data licensing to software providers and agencies that intend to supply our data to their customers and use it in their software solutions.

    and much more

    Why Choose 1 Stop Data?

    • We offer our clients a unique, single source of quality and compliant data.
    • We don't rely on 3rd party vendors.
    • We utilise extensive verification processes to help ensure phone numbers and emails are accurate and connect to the right person.
    • We are budget-friendly and our team are highly experienced.

    Products and Services:

    The oscar4.io web platform for self-service data on demand Bulk data feeds Data hygiene, standardisation, cleansing and enrichment Know Your Business (KYB)

    Keywords:

    B2B,Prospect Data,Validated Work Emails,Personal Emails,Email Enrichment,Company Data,Lead Enrichment,Data Enhancement,Account Based Marketing (ABM),Customer Data,Phone Enrichment,LinkedIn URL,Market Intelligence,Business Intelligence,Data Append,Contact Data,Lead Generation,360-Degree Customer View,Data Cleansing,Lead Data,Email and Phone Validation,Data Augmentation,Segmentation,Data Enrichment,Email Marketing,Data Intelligence,Direct Marketing,Customer Insights,Audience Targeting,Audience Generation,Mobile Phone,B2B Data Enrichment,Social Advertising,Due Diligence,B2B Advertising,Audience Insights,B2B Lead Retargeting,Contact Information,Demographic Data,Consumer Data Enrichment,People-Based Marketing,Contact Data Enrichment,Customer Data Insights,Prospecting,Sales Intelligence,Predictive Analytics,Email Address Validation,Company Data Enrichment,Audience Intelligence,Cold Outreach,Analytics,Marketing Data Enrichment,Customer Acquisition,Data Cleansing,B2C Data,People Data,Professional Information,Recruiting and HR,KYC,B2B List Validation,Lead Information,Sales Prospecting,B2B Sales,B2B Data,Lead Lists,Contact Validation,Competitive Intelligence,Customer Data Enrichment,Identity Resolution,Identity Validation,Data Science,B2C Data Enrichment,B2C,Lead Data Enrichment,Social Media Data.

  7. d

    Business Licenses

    • catalog.data.gov
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2025). Business Licenses [Dataset]. https://catalog.data.gov/dataset/business-licenses
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    data.cityofchicago.org
    Description

    NOTE, 2/21/2025 - We have added three geographic columns to this dataset. Business licenses issued by the Department of Business Affairs and Consumer Protection in the City of Chicago from 2002 to the present. This dataset contains a large number of records/rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search. Data fields requiring description are detailed below. APPLICATION TYPE: ‘ISSUE’ is the record associated with the initial license application. ‘RENEW’ is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. ‘C_LOC’ is a change of location record. It means the business moved. ‘C_CAPA’ is a change of capacity record. Only a few license types may file this type of application. ‘C_EXPA’ only applies to businesses that have liquor licenses. It means the business location expanded. 'C_SBA' is a change of business activity record. It means that a new business activity was added or an existing business activity was marked as expired. LICENSE STATUS: ‘AAI’ means the license was issued. ‘AAC’ means the license was cancelled during its term. ‘REV’ means the license was revoked. 'REA' means the license revocation has been appealed. LICENSE STATUS CHANGE DATE: This date corresponds to the date a license was cancelled (AAC), revoked (REV) or appealed (REA). Business License Owner information may be accessed at: https://data.cityofchicago.org/dataset/Business-Owners/ezma-pppn. To identify the owner of a business, you will need the account number or legal name, which may be obtained from this Business Licenses dataset. Data Owner: Business Affairs and Consumer Protection. Time Period: January 1, 2002 to present. Frequency: Data is updated daily.

  8. US National MLS Property Listings Data | Multiple Listing Service | 60M+...

    • datarade.ai
    .csv, .xls, .txt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Warren Group, US National MLS Property Listings Data | Multiple Listing Service | 60M+ Records | Property & Building Characteristics [Dataset]. https://datarade.ai/data-products/u-s-national-mls-real-estate-data-multiple-listing-service-the-warren-group
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset authored and provided by
    The Warren Group
    Area covered
    United States of America
    Description

    Unlock the Potential of U.S. National MLS Real Estate Data

    Discover the wealth of information encapsulated in licensing bulk MLS (Multiple Listing Service) data, a cornerstone of the real estate realm. From property particulars to market trends, delve into the significance and multifaceted utility of MLS data across diverse industries.

    MLS Real Estate Data includes:

    • Property Information: Address, size, layout, condition, amenities, and more.
    • Price History: Historical price changes, listing dates, and sales dates.
    • Geographic Insights: Location, neighborhood information, school districts, and proximity to amenities.
    • Property Photos: MLS images of properties (see the condition of a property inside and out.)
    • Agent/Broker Information: Certain details about the listing agent or broker as well as their notes on properties.
    • Market Dynamics: Data on local real estate market conditions, including inventory levels, price trends, and days on the market.
  9. d

    Firmographic Data | 4MM + US Private and Public Companies | Employees,...

    • datarade.ai
    .json, .csv, .xls
    Updated Oct 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salutary Data (2023). Firmographic Data | 4MM + US Private and Public Companies | Employees, Revenue, Website, Industry + More Firmographics [Dataset]. https://datarade.ai/data-products/salutary-data-firmographic-data-4m-us-private-and-publi-salutary-data
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Oct 16, 2023
    Dataset authored and provided by
    Salutary Data
    Area covered
    United States
    Description

    Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.

    We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.

    What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.

    Products: API Suite Web UI Full and Custom Data Feeds

    Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.

  10. d

    Property Listings Data | USA Coverage | 74% Right Party Contact Rate |...

    • datarade.ai
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BatchService (2024). Property Listings Data | USA Coverage | 74% Right Party Contact Rate | BatchData [Dataset]. https://datarade.ai/data-products/batchservice-u-s-property-listings-data-real-estate-mark-batchservice
    Explore at:
    .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Aug 14, 2024
    Dataset authored and provided by
    BatchService
    Area covered
    United States of America
    Description

    BatchData's property listings data provides comprehensive insights with over 140 data points and nationwide listing data inclusive of For Sale By Owner (FSBO) listings across the United States. Updated daily in most markets, the data includes:

    • Listing Details: property listings descriptions, property characteristics, pricing, days on market, and more.
    • Agent Information: agent names, license numbers, contact details, listing counts, and listing histories.
    • Broker Information: Broker names, locations, URLs, emails, phone numbers, and licensing information.
    • Additional Details: Information about schools, neighborhoods, subdivisions, and tax data.

    Common Use Cases: - Recruiting Teams: Enhance talent acquisition by analyzing agents' listing counts, close rates, property types, and client profiles. - Proptech Software & Marketplaces: Integrate current and historical listings to create detailed property profiles, advanced search features, and robust analytics. - Home Service Providers: Target marketing and outreach efforts to homeowners, whether they are preparing to move or have recently relocated. - Real Estate Agents & Investors: Identify undervalued properties, connect with buyers/sellers based on activity, analyze market trends, and develop effective marketing strategies.

    Our property listings data can be delivered in a variety of formats to suit your needs. Choose from API integration for seamless, real-time data access, bulk data delivery for extensive datasets, S3 bucket storage for scalable cloud solutions, and more. This flexibility ensures that you can incorporate our comprehensive property information into your systems efficiently and effectively, whether you're building a new platform, enhancing existing tools, or conducting in-depth analyses.

  11. C

    City of Chicago Data prtal

    • data.cityofchicago.org
    application/rdfxml +5
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). City of Chicago Data prtal [Dataset]. https://data.cityofchicago.org/widgets/qd2y-e669
    Explore at:
    application/rdfxml, xml, csv, application/rssxml, tsv, jsonAvailable download formats
    Dataset updated
    Jun 29, 2025
    Authors
    City of Chicago
    Area covered
    Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  12. C

    dfeq

    • data.cityofchicago.org
    Updated Jun 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). dfeq [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/dfeq/m5d6-e9d8
    Explore at:
    application/rssxml, application/rdfxml, csv, xml, application/geo+json, kml, kmz, tsvAvailable download formats
    Dataset updated
    Jun 29, 2025
    Authors
    City of Chicago
    Description

    Business licenses issued by the Department of Business Affairs and Consumer Protection in the City of Chicago from 2006 to the present. This dataset contains a large number of records/rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: ‘ISSUE’ is the record associated with the initial license application. ‘RENEW’ is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. ‘C_LOC’ is a change of location record. It means the business moved. ‘C_CAPA’ is a change of capacity record. Only a few license types may file this type of application. ‘C_EXPA’ only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: ‘AAI’ means the license was issued. ‘AAC’ means the license was cancelled during its term. ‘REV’ means the license was revoked. 'REA' means the license revocation has been appealed.

    LICENSE STATUS CHANGE DATE: This date corresponds to the date a license was cancelled (AAC), revoked (REV) or appealed (REA).

    Business License Owner information may be accessed at: https://data.cityofchicago.org/dataset/Business-Owners/ezma-pppn. To identify the owner of a business, you will need the account number or legal name, which may be obtained from this Business Licenses dataset.

    Data Owner: Business Affairs and Consumer Protection. Time Period: January 1, 2006 to present. Frequency: Data is updated daily.

  13. z

    The Paradox of Innovation Non-Disclosure: Evidence from Licensing Contracts...

    • zenodo.org
    bin, zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Kwan; Gaurav Kankanhalli; Ken Merkley; Alan Kwan; Gaurav Kankanhalli; Ken Merkley (2025). The Paradox of Innovation Non-Disclosure: Evidence from Licensing Contracts (patent data) [Dataset]. http://doi.org/10.5281/zenodo.10426559
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    American Economic Journal: Applied Economics
    Authors
    Alan Kwan; Gaurav Kankanhalli; Ken Merkley; Alan Kwan; Gaurav Kankanhalli; Ken Merkley
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
  14. CoreLogic Loan-Level Market Analytics

    • redivis.com
    application/jsonl +7
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University Libraries (2024). CoreLogic Loan-Level Market Analytics [Dataset]. http://doi.org/10.57761/a96q-1j33
    Explore at:
    avro, sas, spss, stata, arrow, parquet, csv, application/jsonlAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford University Libraries
    Description

    Abstract

    The CoreLogic Loan-Level Market Analytics (LLMA) for primary mortgages dataset contains detailed loan data, including origination, events, performance, forbearance and inferred modification data.

    Methodology

    CoreLogic sources the Loan-Level Market Analytics data directly from loan servicers. CoreLogic cleans and augments the contributed records with modeled data. The Data Dictionary indicates which fields are contributed and which are inferred.

    The Loan-Level Market Analytics data is aimed at providing lenders, servicers, investors, and advisory firms with the insights they need to make trustworthy assessments and accurate decisions. Stanford Libraries has purchased the Loan-Level Market Analytics data for researchers interested in housing, economics, finance and other topics related to prime and subprime first lien data.

    CoreLogic provided the data to Stanford Libraries as pipe-delimited text files, which we have uploaded to Data Farm (Redivis) for preview, extraction and analysis.

    For more information about how the data was prepared for Redivis, please see CoreLogic 2024 GitLab.

    Usage

    Per the End User License Agreement, the LLMA Data cannot be commingled (i.e. merged, mixed or combined) with Tax and Deed Data that Stanford University has licensed from CoreLogic, or other data which includes the same or similar data elements or that can otherwise be used to identify individual persons or loan servicers.

    The 2015 major release of CoreLogic Loan-Level Market Analytics (for primary mortgages) was intended to enhance the CoreLogic servicing consortium through data quality improvements and integrated analytics. See **CL_LLMA_ReleaseNotes.pdf **for more information about these changes.

    For more information about included variables, please see CL_LLMA_Data_Dictionary.pdf.

    **

    For more information about how the database was set up, please see LLMA_Download_Guide.pdf.

    Bulk Data Access

    Data access is required to view this section.

  15. C

    PPA and Music and Dance Venues

    • data.cityofchicago.org
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). PPA and Music and Dance Venues [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/PPA-and-Music-and-Dance-Venues/3bhe-fmu4
    Explore at:
    xml, application/rssxml, csv, tsv, application/rdfxml, kmz, application/geo+json, kmlAvailable download formats
    Dataset updated
    Jul 1, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  16. Z

    Data from: unarXive: A Large Scholarly Data Set with Publications'...

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saier, Tarek (2024). unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2553522
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    Saier, Tarek
    Färber, Michael
    Description

    Description

    unarXive is a scholarly data set containing publications' full-text, annotated in-text citations, and a citation network.

    The data is generated from all LaTeX sources on arXiv and therefore of higher quality than data generated from PDF files.

    Typical use cases are

    Citation recommendation

    Citation context analysis

    Bibliographic analyses

    Reference string parsing

    This version (v3) of our data set is based on all arXiv publications until 2020-07-31 and on the Microsoft Academic Graph as of 2020-08-18. As additional contribution, we included a table with the publication date and the scientific discipline for each paper for easier filtering.

    Note: This Zenodo record is an old version of unarXive. You can find the most recent version at https://zenodo.org/record/7752754 and https://zenodo.org/record/7752615

    Access

    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ D O W N L O A D S A M P L E  ┃┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛

    To download the whole data set send an access request and note the following:

    Note: this Zenodo record is a "full" version of unarXive, which was generated from all of arXiv.org including non-permissively licensed papers. Make sure that your use of the data is compliant with the paper's licensing terms.¹

    ¹ For information on papers' licenses use arXiv's bulk metadata access.

    The code used for generating the data set is publicly available.

    Usage examples for our data set are provided at here on GitHub.

    Citing

    This initial version of unarXive is described in the following journal article.

    Tarek Saier, Michael Färber: "unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata", Scientometrics, 2020,[link to an author copy]

    The updated version is described in the following conference paper.

    Tarek Saier, Michael Färber. "unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network", JCDL 2023.[link to an author copy]

  17. C

    Active Sites as of the end of the Dashboard Month

    • data.cityofchicago.org
    application/rdfxml +5
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Active Sites as of the end of the Dashboard Month [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/Active-Sites-as-of-the-end-of-the-Dashboard-Month/f5ei-3p8h
    Explore at:
    json, csv, application/rssxml, xml, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Jun 29, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  18. C

    CHICAGO LICENSING

    • data.cityofchicago.org
    Updated Jun 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). CHICAGO LICENSING [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/CHICAGO-LICENSING/sssn-u5sf
    Explore at:
    tsv, csv, application/rdfxml, xml, application/rssxml, kml, kmz, application/geo+jsonAvailable download formats
    Dataset updated
    Jun 29, 2025
    Authors
    City of Chicago
    Area covered
    Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  19. C

    Data from: Manufacturing Establishments

    • data.cityofchicago.org
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Manufacturing Establishments [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/Manufacturing-Establishments/es3k-j9sz
    Explore at:
    application/rssxml, application/rdfxml, csv, tsv, xml, kml, application/geo+json, kmzAvailable download formats
    Dataset updated
    Jun 29, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  20. C

    2660533

    • data.cityofchicago.org
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). 2660533 [Dataset]. https://data.cityofchicago.org/widgets/8tpp-pj5v?mobile_redirect=true
    Explore at:
    tsv, csv, kml, application/rdfxml, xml, application/geo+json, application/rssxml, kmzAvailable download formats
    Dataset updated
    Jul 1, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163

Data from: A Large-scale Dataset of (Open Source) License Text Variants

Related Article
Explore at:
Dataset updated
Mar 31, 2022
Dataset authored and provided by
Stefano Zacchiroli
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

For more details see the included README file and companion paper:

Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

Search
Clear search
Close search
Google apps
Main menu