100+ datasets found
  1. P

    Common Crawl Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 8, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Common Crawl Dataset [Dataset]. https://paperswithcode.com/dataset/common-crawl
    Explore at:
    Dataset updated
    Oct 8, 2014
    Description

    The Common Crawl corpus contains petabytes of data collected over 12 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

  2. The Global Anti crawling Techniques Market is Growing at Compound Annual...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Dec 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). The Global Anti crawling Techniques Market is Growing at Compound Annual Growth Rate of 6.00% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/anti-crawling-techniques-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Dec 22, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, The Global Anti crawling Techniques market size is USD XX million in 2023 and will expand at a compound annual growth rate (CAGR) of 6.00% from 2023 to 2030.

    North America Anti crawling Techniques held the major market of more than 40% of the global revenue and will grow at a compound annual growth rate (CAGR) of 4.2% from 2023 to 2030.
    Europe Anti crawling Techniques accounted for a share of over 30% of the global market and are projected to expand at a compound annual growth rate (CAGR) of 4.5% from 2023 to 2030.
    Asia Pacific Anti crawling Techniques held the market of more than 23% of the global revenue and will grow at a compound annual growth rate (CAGR) of 8.0% from 2023 to 2030.
    South American Anti crawling Techniques market of more than 5% of the global revenue and will grow at a compound annual growth rate (CAGR) of 5.4% from 2023 to 2030.
    Middle East and Africa Anti crawling Techniques held the major market of more than 2% of the global revenue and will grow at a compound annual growth rate (CAGR) of 5.7% from 2023 to 2030.
    The market for anti-crawling techniques has grown dramatically as a result of the increasing number of data breaches and public awareness of the need to protect sensitive data. 
    Demand for bot fingerprint databases remains higher in the anti crawling techniques market.
    The content protection category held the highest anti crawling techniques market revenue share in 2023.
    

    Increasing Demand for Protection and Security of Online Data to Provide Viable Market Output

    The market for anti-crawling techniques is expanding due in large part to the growing requirement for online data security and protection. Due to an increase in digital activity, organizations are processing and storing enormous volumes of sensitive data online. Organizations are being forced to invest in strong anti-crawling techniques due to the growing threat of data breaches, illegal access, and web scraping occurrences. By protecting online data from harmful activity and guaranteeing its confidentiality and integrity, these technologies advance the industry. Moreover, the significance of protecting digital assets is increased by the widespread use of the Internet for e-commerce, financial transactions, and sensitive data transfers. Anti-crawling techniques are essential for reducing the hazards connected to online scraping, which is a tactic often used by hackers to obtain important data.

    Increasing Incidence of Cyber Threats to Propel Market Growth
    

    The growing prevalence of cyber risks, such as site scraping and data harvesting, is driving growth in the market for anti-crawling techniques. Organizations that rely significantly on digital platforms run a higher risk of having illicit data extracted. In order to safeguard sensitive data and preserve the integrity of digital assets, organizations have been forced to invest in sophisticated anti-crawling techniques that strengthen online defenses. Moreover, the market's growth is a reflection of growing awareness of cybersecurity issues and the need to put effective defenses in place against changing cyber threats. Moreover, cybersecurity is constantly challenged by the spread of advanced and automated crawling programs. The ever-changing threat landscape forces enterprises to implement anti-crawling techniques, which use a variety of tools like rate limitation, IP blocking, and CAPTCHAs to prevent fraudulent scraping efforts.

    Market Restraints of the Anti crawling Techniques

    Increasing Demand for Ethical Web Scraping to Restrict Market Growth
    

    The growing desire for ethical web scraping presents a unique challenge to the anti-crawling techniques market. Ethical web scraping is the process of obtaining data from websites for lawful objectives, such as market research or data analysis, but without breaching the terms of service. Furthermore, the restraint arises because anti-crawling techniques must distinguish between criminal and ethical scraping operations, finding a balance between preventing websites from misuse and permitting authorized data harvest. This dynamic calls for more complex and adaptable anti-crawling techniques to distinguish between destructive and ethical scrapping actions.

    Impact of COVID-19 on the Anti Crawling Techniques Market

    The demand for online material has increased as a result of the COVID-19 pandemic, which has...

  3. ScrapeHero Data Cloud - Free and Easy to use

    • datarade.ai
    .json, .csv
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scrapehero (2022). ScrapeHero Data Cloud - Free and Easy to use [Dataset]. https://datarade.ai/data-products/scrapehero-data-cloud-free-and-easy-to-use-scrapehero
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 8, 2022
    Dataset provided by
    ScrapeHero
    Authors
    Scrapehero
    Area covered
    Bhutan, Ghana, Bahamas, Portugal, Slovakia, Anguilla, Niue, Chad, Dominica, Bahrain
    Description

    The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs

    We have made it as simple as possible to collect data from websites

    Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.

    Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.

    Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.

    Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.

    Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.

    Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.

    Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.

    Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.

    Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.

    Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.

    Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.

    Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.

    Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.

    Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.

    LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.

    Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.

    Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.

    Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.

    Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.

    Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.

    Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.

    Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.

    Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.

    Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.

  4. Global import data of Crawler Excavator

    • volza.com
    csv
    Updated Mar 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global import data of Crawler Excavator [Dataset]. https://www.volza.com/p/crawler-excavator/import/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    Volza
    Authors
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    119416 Global import shipment records of Crawler Excavator with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  5. l

    Data from: esCorpius: A Massive Spanish Crawling Corpus

    • lindat.cz
    • live.european-language-grid.eu
    • +1more
    Updated Sep 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gutiérrez-Fandiño Asier; Pérez-Fernández David; Armengol-Estapé Jordi; Griol David; Callejas Zoraida (2022). esCorpius: A Massive Spanish Crawling Corpus [Dataset]. https://lindat.cz/repository/xmlui/handle/11372/LRT-4807?show=full
    Explore at:
    Dataset updated
    Sep 10, 2022
    Authors
    Gutiérrez-Fandiño Asier; Pérez-Fernández David; Armengol-Estapé Jordi; Griol David; Callejas Zoraida
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    In the recent years, Transformer-based models have lead to significant advances in language modelling for natural language processing. However, they require a vast amount of data to be (pre-)trained and there is a lack of corpora in languages other than English. Recently, several initiatives have presented multilingual datasets obtained from automatic web crawling. However, the results in Spanish present important shortcomings, as they are either too small in comparison with other languages, or present a low quality derived from sub-optimal cleaning and deduplication. In this paper, we introduce esCorpius, a Spanish crawling corpus obtained from near 1 Pb of Common Crawl data. It is the most extensive corpus in Spanish with this level of quality in the extraction, purification and deduplication of web textual content. Our data curation process involves a novel highly parallel cleaning pipeline and encompasses a series of deduplication mechanisms that together ensure the integrity of both document and paragraph boundaries. Additionally, we maintain both the source web page URL and the WARC shard origin URL in order to complain with EU regulations. esCorpius has been released under CC BY-NC-ND 4.0 license.

  6. Global import data of Crawler

    • volza.com
    csv
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global import data of Crawler [Dataset]. https://www.volza.com/p/crawler/import/import-in-united-states/coo-hong-kong/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Volza
    Authors
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    42 Global import shipment records of Crawler with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  7. Abcúg

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Apr 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner (2022). Abcúg [Dataset]. http://doi.org/10.5281/zenodo.4636762
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 17, 2014 - Dec 31, 2019
    Description

    This object has been created as a part of the web harvesting project of the Eötvös Loránd University Department of Digital Humanities ELTE DH. Learn more about the workflow HERE about the software used HERE.The aim of the project is to make online news articles and their metadata suitable for research purposes. The archiving workflow is designed to prevent modification or manipulation of the downloaded content. The current version of the curated content with normalized formatting in standard TEI XML format with Schema.org encoded metadata is available HERE. The detailed description of the raw content is the following:

    • The portal's archived content (from 2014-09-17 to 2019-12-31) in WARC format available HERE (crawled: 2020-01-27T18:58:23 - 2020-01-27T22:58:20.024419). No further versions are expected because the crawl is created after the portal has stopped publication.
  8. Crawler data set of extreme drought historical events in 34 key node areas...

    • tpdc.ac.cn
    • data.tpdc.ac.cn
    zip
    Updated Apr 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yong GE; Feng LING (2020). Crawler data set of extreme drought historical events in 34 key node areas along the route of One Belt And One Road [Dataset]. https://www.tpdc.ac.cn/view/googleSearch/dataDetail?metadataId=c3530763-416e-4243-9115-554116a388c9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2020
    Dataset provided by
    Tanzania Petroleum Development Corporationhttp://tpdc.co.tz/
    Authors
    Yong GE; Feng LING
    Area covered
    Description

    The extreme drought damage historical events data of the 34 key areas along One Belt One Road were collected from Internet. First, a Web crawler was coded by python language. Using several key words about extreme drought damage, web pages were then collected by Google and Baidu search engine. Last, important information about the extreme drought events (e.g., place, time, affected area, affected population, count of death) were extracted from web pages. This data can be used for risk assessment of extreme drought in the 34 key areas along One Belt One Road.

  9. AEROARMS - Image Dataset for the Crawler Indirect Detection through its Cage...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Laplaza; Albert Pumarola; Juan Andrade; Alberto Sanfeliu; Javier Laplaza; Albert Pumarola; Juan Andrade; Alberto Sanfeliu (2020). AEROARMS - Image Dataset for the Crawler Indirect Detection through its Cage [Dataset]. http://doi.org/10.5281/zenodo.2636666
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Javier Laplaza; Albert Pumarola; Juan Andrade; Alberto Sanfeliu; Javier Laplaza; Albert Pumarola; Juan Andrade; Alberto Sanfeliu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset containing images and ground-truth position of the crawler's cage used in the AEROARMS project experiments.

  10. L

    Live Crawling Service Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jan 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Live Crawling Service Report [Dataset]. https://www.marketresearchforecast.com/reports/live-crawling-service-13827
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jan 25, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Overview: The global live crawling service market is experiencing significant growth, fueled by the increasing adoption of data analytics and the need for real-time data insights. With a market size of USD XXX million in 2025 and a CAGR of XX%, it is projected to reach a value of USD million by 2033. The market is driven by the proliferation of digital technologies, the growing demand for personalization in various industries, and the need to improve decision-making capabilities. Key Trends and Segments: Two primary segments drive the live crawling service market: Type (web data crawling, PDF data crawling, others) and Application (SMEs, large enterprises). Key trends include the rise of artificial intelligence (AI) and machine learning (ML), which enhance data extraction accuracy and efficiency. Moreover, the adoption of cloud-based crawling services is increasing due to their scalability, cost-effectiveness, and ease of implementation. Regionally, North America dominates the market, followed by Europe and Asia-Pacific. Emerging economies in Asia-Pacific and the Middle East and Africa are expected to witness significant growth due to rapid digitalization and the expanding adoption of data analytics solutions.

  11. The Global Crawler Camera market size was USD 966.8 Million in 2023!

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). The Global Crawler Camera market size was USD 966.8 Million in 2023! [Dataset]. https://www.cognitivemarketresearch.com/crawler-camera-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, The Global Crawler Camera market size is USD 966.8 million in 2023 and will expand at a compound annual growth rate (CAGR) of 15.50% from 2023 to 2030.

    North America Crawler Camera held the major market of more than 40% of the global revenue with a market size of USD 141.04 million in 2023 and will grow at a compound annual growth rate (CAGR) of 13.7% from 2023 to 2030.
    Europe Crawler Camera accounted for a share of over 30% of the global market size of USD 352.6 million in 2023.
    Asia Pacific Crawler Camera held the market of more than 23% of the global revenue with a market size of USD 352.6 million in 2023 and will grow at a compound annual growth rate (CAGR) of 17.5% from 2023 to 2030.
    South America Crawler Camera market of more than 5% of the global revenue with a market size of USD 17.63 million in 2023 and will grow at a compound annual growth rate (CAGR) of 14.9% from 2023 to 2030.
    Middle East and Africa Crawler Camera held the major market of more than 2% of the global revenue with a market size of USD 352.6 million in 2023 and will grow at a compound annual growth rate (CAGR) of 15.2% from 2023 to 2030.
    The demand for crawler cameras is rising due to the numerous strategies adopted by key participants.
    Demand for pipe inspection crawlers remains higher in the crawler camera market.
    

    Infrastructure Development and Regulatory Compliance to Provide Viable Market Output

    Increasing infrastructure development projects, such as the construction of pipelines, sewer systems, and utility networks, drive the demand for crawler camera systems. These systems play a crucial role in inspecting and maintaining the integrity of these infrastructure assets. Moreover, regulatory requirements and standards for inspection and maintenance of infrastructure assets, particularly in sectors such as wastewater management and utilities, drive the demand for crawler camera systems. Compliance with these regulations is essential for ensuring public safety and environmental protection.

    For instance, in 2018, Rausch Electronics USA, a manufacturer of sewer inspection equipment, acquired Ratech Electronics Ltd, a Canadian manufacturer of inspection cameras and equipment. This acquisition allowed Rausch Electronics to expand its product offerings and reach in the crawler camera market.

    (Source: tracxn.com/d/companies/rausch-electronics-usa/_CoH3HIoSSoIIQ0rftC8-rvtULB86Oh2q19IrH78jvts)

    Increasing Awareness of Preventive Maintenance and Environmental Concerns to Propel Market Growth
    

    Industries are increasingly recognizing the benefits of preventive maintenance over reactive maintenance. Regular inspections using crawler camera systems allow for early detection of issues, reducing the risk of costly breakdowns and ensuring uninterrupted operations. In addition, the growing environmental concerns and the need for sustainable practices drive the demand for crawler camera systems. By identifying and addressing issues in underground and underwater infrastructure, these systems help prevent leaks, spills, and other environmental hazards.

    For instance, in 2021, RICOH launched the R Development Kit, a compact and versatile crawler camera system. This system features a high-resolution camera, LED lighting, and wireless connectivity, allowing users to inspect and capture images and videos in various applications.

    (Source: support.ricoh.com/bb_v1oi/pub_e/oi_view/0001080/0001080106/view/manual/int/0014.htm)

    Market Restraints of the Crawler Camera

    High Initial Investment, Lack of Awareness and Knowledge, and Technical Limitations to Restrict Market Growth
    

    The crawler camera market faces several key restraints that impact its development. One significant restraint is the high initial investment required for crawler camera systems, which can deter small and medium-sized businesses with limited budgets from adopting these systems. Additionally, there is a lack of awareness and knowledge about the benefits and capabilities of crawler camera systems, hindering their wider adoption. Technical limitations such as battery life, manoeuvrability challenges, and difficulties in capturing clear images or videos in certain conditions also restrain market growth. The need for specialized training and skill sets to operate and interpret data from crawler camera systems can be a barrier for some organizations. Market fragmentation, with multipl...

  12. Kuruc.info [WARC 2000-2022]

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner (2022). Kuruc.info [WARC 2000-2022] [Dataset]. http://doi.org/10.5281/zenodo.6334479
    Explore at:
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner
    Time period covered
    May 9, 2000 - Feb 17, 2022
    Description

    This object contains only a fraction of the available content for the portal. For further information on the content and for other fractions see: Kuruc.info.
    Please fill in the following form before requesting access to this dataset:ACCES FORM

  13. Természet Világa [TEI]

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner (2022). Természet Világa [TEI] [Dataset]. http://doi.org/10.5281/zenodo.5831344
    Explore at:
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner
    Time period covered
    Dec 15, 2021
    Description

    This object contains is the most comprehensive curated version available at the date of publication. For further information on the content and for other fractions see: Természet Világa.
    Please fill in the following form before requesting access to this dataset:ACCES FORM

  14. Abcúg [WARC 2014-2019]

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner (2022). Abcúg [WARC 2014-2019] [Dataset]. http://doi.org/10.5281/zenodo.4664438
    Explore at:
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner
    Time period covered
    Sep 17, 2014 - Dec 31, 2019
    Description

    This object contains only a fraction of the available content for the portal. For further information on the content and for other fractions see: Abcúg.


    Please fill in the following form before requesting access to this dataset:ACCES FORM

  15. Index / koronavírus [2021-01-31/2021-05-24]

    • zenodo.org
    • explore.openaire.eu
    • +1more
    Updated Apr 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner (2022). Index / koronavírus [2021-01-31/2021-05-24] [Dataset]. http://doi.org/10.5281/zenodo.4899579
    Explore at:
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gábor Palkó; Gábor Palkó; Balázs Indig; Balázs Indig; Zsófia Fellegi; Zsófia Fellegi; Zsófia Sárközi-Lindner; Zsófia Sárközi-Lindner
    Time period covered
    Jan 31, 2021 - May 24, 2021
    Description

    This object contains only a fraction of the available content for the portal. For further information on the content and for other fractions see: Index / koronavírus.
    Please fill in the following form before requesting access to this dataset:ACCES FORM

  16. PromptCloud Ecommerce Data - Web Scraping & Data Extraction from Online...

    • datarade.ai
    .json, .xml, .csv
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2023). PromptCloud Ecommerce Data - Web Scraping & Data Extraction from Online Marketplaces Globally | Custom Data Extraction Services | 99% Data Accuracy [Dataset]. https://datarade.ai/data-products/promptcloud-ecommerce-data-web-scraping-data-extraction-f-promptcloud
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Nov 21, 2023
    Dataset authored and provided by
    PromptCloud
    Area covered
    Falkland Islands (Malvinas), Bolivia (Plurinational State of), Tokelau, Canada, Virgin Islands (British), Panama, Greece, Pakistan, Åland Islands, Mongolia
    Description

    You can quickly implement eCommerce data scraping projects within a short period of time by following a few easy steps. Where you will see that our core focus is on data quality and speed of implementation.

    We can fulfill your large scale data scraping requirements even on complex sites without any coding in the shortest time possible. We have ready-to-use eCommerce scraping recipes as a result of our vast experience in building large-scale web crawlers for multiple clients across different verticals, catering to various use cases, including, but not limited to:

    1. Product Price Tracking
    2. Product Demand Analysis
    3. Product Trends
    4. Sentiment Analysis
    5. Seller Analysis
    6. Competitor Monitoring

    We are committed to putting data at the heart of your business. Reach out for a no-frills PromptCloud experience- professional, technologically ahead and reliable.

  17. Crawler Drill Import Data India, Crawler Drill Customs Import Shipment Data

    • seair.co.in
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Crawler Drill Import Data India, Crawler Drill Customs Import Shipment Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    India
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  18. Data from: A winged relative of ice crawlers in amber bridges the cryptic...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Mar 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingying Cui; Jérémie Bardin; Benjamin Wipfler; Alexandre Demers‐Potvin; Ming Bai; Yi‐Jie Tong; Grace Nuoxi Chen; Huarong Chen; Zhen‐Ya Zhao; Dong Ren; Olivier Béthoux (2024). A winged relative of ice crawlers in amber bridges the cryptic extant Xenonomia and a rich fossil record [Dataset]. http://doi.org/10.5061/dryad.18931zd4f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 14, 2024
    Dataset provided by
    Leibniz Institute for the Analysis of Biodiversity Change
    McGill University
    Centre de recherche en paléontologie - Paris
    Chinese Academy of Sciences
    Capital Normal University
    South China Normal University
    Authors
    Yingying Cui; Jérémie Bardin; Benjamin Wipfler; Alexandre Demers‐Potvin; Ming Bai; Yi‐Jie Tong; Grace Nuoxi Chen; Huarong Chen; Zhen‐Ya Zhao; Dong Ren; Olivier Béthoux
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Until the advent of phylogenomics, the atypical morphology of extant representatives of the insect orders Grylloblattodea (ice crawlers) and Mantophasmatodea (gladiators) had confounding effects on efforts to resolve their placement within Polyneoptera. This recent research has unequivocally shown that these species‐poor groups are closely related and form the clade Xenonomia. Nonetheless, divergence dates of these groups remain poorly constrained, and their evolutionary history debated, as the few well‐identified fossils, characterized by a suite of morphological features similar to that of extant forms, are comparatively young. Notably, the extant forms of both groups are wingless, whereas most of the pre‐Cretaceous insect fossil record is composed of winged insects, which represents a major shortcoming of the taxonomy. Here, we present new specimens embedded in Early Cretaceous amber from Myanmar and belonging to the recently described species Aristovia daniili. The abundant material and pristine preservation allowed a detailed documentation of the morphology of the species, including critical head features. Combined with a morphological data set encompassing all Polyneoptera, these new data unequivocally demonstrate that A. daniili is a winged stem Grylloblattodea. This discovery demonstrates that winglessness was acquired independently in Grylloblattodea and Mantophasmatodea. Concurrently, wing apomorphic traits shared by the new fossil and earlier fossils demonstrate that a large subset of the former “Protorthoptera” assemblage, representing a third of all known insect species in some Permian localities, are genuine representatives of Xenonomia. Data from the fossil record depict a distinctive evolutionary trajectory, with the group being both highly diverse and abundant during the Permian but experiencing a severe decline from the Triassic onwards. Methods The RTI file composing this dataset was derived from a set of photographs obtained using a light dome of about 30 cm in diameter and equipped with 54 LEDs, and a camera Canon EOS 5DS equipped with a MP-E 65 mm macro lens, both driven by a control box (dome and control box, Flydome, Paris, France; camera body and lens, Canon, Tokyo, Japan). The 45 usable photographs (9 were excluded due to improper exposure) were batch-optimized, including a ‘horizontal flipping’ step, using Adobe Photoshop CS6 and were further compiled into an RTI file using the RTI Builder software v. 2.0.2 using the HSH fitter (software freely available from Cultural Heritage Imaging, San Francisco, CA, USA).

  19. Z

    This is a test for the crawlers

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John (2020). This is a test for the crawlers [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4107084
    Explore at:
    Dataset updated
    Oct 20, 2020
    Dataset authored and provided by
    John
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    this is a test for the CONP crawler

  20. Sany Crawler Crane Import Data India, Sany Crawler Crane Customs Import...

    • seair.co.in
    Updated Nov 8, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2016). Sany Crawler Crane Import Data India, Sany Crawler Crane Customs Import Shipment Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Nov 8, 2016
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    India
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Common Crawl Dataset [Dataset]. https://paperswithcode.com/dataset/common-crawl

Common Crawl Dataset

Explore at:
Dataset updated
Oct 8, 2014
Description

The Common Crawl corpus contains petabytes of data collected over 12 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

Search
Clear search
Close search
Google apps
Main menu