Nowadays web portals play an essential role in searching and retrieving information in the several fields of knowledge: they are ever more technologically advanced and designed for supporting the storage of a huge amount of information in natural language originating from the queries launched by users worldwide. A good example is given by the WorldWideScience search engine: The database is available at http://worldwidescience.org/. It is based on a similar gateway, Science.gov, which is the major path to U.S. government science information, as it pulls together Web-based resources from various agencies. The information in the database is intended to be of high quality and authority, as well as the most current available from the participating countries in the Alliance, so users will find that the results will be more refined than those from a general search of Google. It covers the fields of medicine, agriculture, the environment, and energy, as well as basic sciences. Most of the information may be obtained free of charge (the database itself may be used free of charge) and is considered ‘‘open domain.’’ As of this writing, there are about 60 countries participating in WorldWideScience.org, providing access to 50+databases and information portals. Not all content is in English. (Bronson, 2009) Given this scenario, we focused on building a corpus constituted by the query logs registered by the GreyGuide: Repository and Portal to Good Practices and Resources in Grey Literature and received by the WorldWideScience.org (The Global Science Gateway) portal: the aim is to retrieve information related to social media which as of today represent a considerable source of data more and more widely used for research ends. This project includes eight months of query logs registered between July 2017 and February 2018 for a total of 445,827 queries. The analysis mainly concentrates on the semantics of the queries received from the portal clients: it is a process of information retrieval from a rich digital catalogue whose language is dynamic, is evolving and follows – as well as reflects – the cultural changes of our modern society.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The research focus in the field of remotely sensed imagery has shifted from collection and warehousing of data ' tasks for which a mature technology already exists, to auto-extraction of information and knowledge discovery from this valuable resource ' tasks for which technology is still under active development. In particular, intelligent algorithms for analysis of very large rasters, either high resolutions images or medium resolution global datasets, that are becoming more and more prevalent, are lacking. We propose to develop the Geospatial Pattern Analysis Toolbox (GeoPAT) a computationally efficient, scalable, and robust suite of algorithms that supports GIS processes such as segmentation, unsupervised/supervised classification of segments, query and retrieval, and change detection in giga-pixel and larger rasters. At the core of the technology that underpins GeoPAT is the novel concept of pattern-based image analysis. Unlike pixel-based or object-based (OBIA) image analysis, GeoPAT partitions an image into overlapping square scenes containing 1,000'100,000 pixels and performs further processing on those scenes using pattern signature and pattern similarity ' concepts first developed in the field of Content-Based Image Retrieval. This fusion of methods from two different areas of research results in orders of magnitude performance boost in application to very large images without sacrificing quality of the output.
GeoPAT v.1.0 already exists as the GRASS GIS add-on that has been developed and tested on medium resolution continental-scale datasets including the National Land Cover Dataset and the National Elevation Dataset. Proposed project will develop GeoPAT v.2.0 ' much improved and extended version of the present software. We estimate an overall entry TRL for GeoPAT v.1.0 to be 3-4 and the planned exit TRL for GeoPAT v.2.0 to be 5-6. Moreover, several new important functionalities will be added. Proposed improvements includes conversion of GeoPAT from being the GRASS add-on to stand-alone software capable of being integrated with other systems, full implementation of web-based interface, writing new modules to extent it applicability to high resolution images/rasters and medium resolution climate data, extension to spatio-temporal domain, enabling hierarchical search and segmentation, development of improved pattern signature and their similarity measures, parallelization of the code, implementation of divide and conquer strategy to speed up selected modules.
The proposed technology will contribute to a wide range of Earth Science investigations and missions through enabling extraction of information from diverse types of very large datasets. Analyzing the entire dataset without the need of sub-dividing it due to software limitations offers important advantage of uniformity and consistency. We propose to demonstrate the utilization of GeoPAT technology on two specific applications. The first application is a web-based, real time, visual search engine for local physiography utilizing query-by-example on the entire, global-extent SRTM 90 m resolution dataset. User selects region where process of interest is known to occur and the search engine identifies other areas around the world with similar physiographic character and thus potential for similar process. The second application is monitoring urban areas in their entirety at the high resolution including mapping of impervious surface and identifying settlements for improved disaggregation of census data.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1041505%2F0625876b77e55a56422bb5a37d881e0d%2Fawdasdw.jpg?generation=1595666545033847&alt=media" alt="">
Ever wondered what people are saying about certain countries? Whether it's in a positive/negative light? What are the most commonly used phrases/words to describe the country? In this dataset I present tweets where a certain country gets mentioned in the hashtags (e.g. #HongKong, #NewZealand). It contains around 150 countries in the world. I've added an additional field called polarity which has the sentiment computed from the text field. Feel free to explore! Feedback is much appreciated!
Each row represents a tweet. Creation Dates of Tweets Range from 12/07/2020 to 25/07/2020. Will update on a Monthly cadence. - The Country can be derived from the file_name field. (this field is very Tableau friendly when it comes to plotting maps) - The Date at which the tweet was created can be got from created_at field. - The Search Query used to query the Twitter Search Engine can be got from search_query field. - The Tweet Full Text can be got from the text field. - The Sentiment can be got from polarity field. (I've used the Vader Model from NLTK to compute this.)
There maybe slight duplications in tweet id's before 22/07/2020. I have since fixed this bug.
Thanks to the tweepy package for making the data extraction via Twitter API so easy.
Feel free to checkout my blog if you want to learn how I built the datalake via AWS or for other data shenanigans.
Here's an App I built using a live version of this data.
According to our latest research, the Quantum-Enhanced NLP Translation Engine market size reached USD 1.63 billion in 2024 globally, driven by rapid advancements in quantum computing and natural language processing (NLP) technologies. The market is projected to grow at a robust CAGR of 28.4% from 2025 to 2033, culminating in a forecasted value of USD 13.47 billion by 2033. This remarkable growth is primarily attributed to the increasing demand for real-time, highly accurate language translation across various sectors, coupled with the integration of quantum computing capabilities that significantly enhance NLP performance and efficiency.
One of the primary growth factors for the Quantum-Enhanced NLP Translation Engine market is the exponential rise in cross-border business operations and global digital content consumption. As enterprises expand internationally, the need for seamless and contextually accurate language translation becomes critical. Quantum-enhanced solutions offer a significant leap over conventional NLP engines by processing complex language structures, idioms, and dialects with unprecedented speed and accuracy. These capabilities are particularly beneficial for multinational corporations, global e-commerce platforms, and international legal firms, all of which require real-time translation for effective communication, compliance, and customer engagement. Moreover, the integration of machine learning with quantum computing further refines translation accuracy, making these engines indispensable in todayÂ’s interconnected world.
Another key driver is the adoption of quantum-enhanced NLP translation engines in high-stakes industries such as healthcare, finance, and government. In healthcare, accurate translation of medical documents, patient records, and research publications is vital for delivering quality care to diverse populations. Quantum NLP engines ensure that medical terminology and nuanced language are interpreted correctly, reducing the risk of errors and improving patient outcomes. Similarly, in finance, these engines facilitate the translation of complex financial documents, regulatory filings, and market analyses, enabling institutions to operate seamlessly across linguistic boundaries. Government agencies also leverage quantum-enhanced translation for diplomatic communications, intelligence analysis, and public service delivery, where accuracy and confidentiality are paramount.
The ongoing evolution of quantum computing hardware and its integration with NLP algorithms is accelerating market growth. As quantum processors become more accessible and scalable, they enable NLP engines to handle larger datasets and more languages simultaneously. This technological synergy is fostering innovation in translation engine software, driving the development of customizable, industry-specific solutions. Service providers are also capitalizing on this trend by offering managed translation services powered by quantum-enhanced NLP, catering to organizations that lack in-house expertise. The convergence of hardware, software, and services within this ecosystem is creating new revenue streams and expanding the addressable market, particularly among enterprises seeking to gain a competitive edge through advanced language capabilities.
The emergence of the Quantum-Enhanced Neural Search Engine is set to revolutionize the way information is retrieved and processed across various domains. By leveraging the principles of quantum computing, this advanced search engine can analyze and interpret vast datasets with unprecedented speed and accuracy. Unlike traditional search engines, which rely heavily on keyword matching and basic algorithms, the Quantum-Enhanced Neural Search Engine utilizes complex neural networks to understand context, semantics, and user intent. This capability is particularly beneficial in fields such as healthcare, finance, and legal, where precise information retrieval is crucial for decision-making and compliance. As organizations increasingly seek to harness the power of quantum computing, the integration of neural search engines is expected to drive significant advancements in data analytics, knowledge management, and artificial intelligence applications.
From a regional perspective, North America currently dominates the Q
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
Finding the perfect wedding venue for your special day can be a daunting task, and with so many options out there, it’s hard to know where to begin. This dataset contains information about wedding venues in Barcelona that will make it easier to determine the best place for you and your partner to get married.
You can take advantage of this data to compare venues, compare prices, search for promotions, and more. With details such as ratings, location, price range and capacity on offer in this dataset you’ll have everything you need at our fingertips when making an informed decision. We’ve also provided links to each venue’s website so that you can explore further if required before committing.
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
This dataset provides detailed information about wedding venues in Barcelona, making it an invaluable resource for couples looking to plan the perfect wedding. With this data, couples can easily compare venues to find one that fits their budget and the atmosphere they are looking for.
Step 1: Explore Venue Data
Step 2: Consider Promotions & Special Deals
Take advantage of any available promotions or special deals associated with potential venues by examining the Promotion column in this dataset. Consider these discounts when comparing prices and making decisions on booking a particular wedding venue in Barcelona.
#### Step 3: Check Out Online Reviews & Follow Up With Vendors Visit each potential vendor’s website to get more information on reviews left by previous customers and additional amenities offered at each vendor’s location. In addition, reach out directly to vendors using contact links provided by URLs listed in this dataset if you have additional questions or unique preferences related planning your big day!
- Creating a search engine or tool to help couples find wedding venues that meet their needs based on various criteria (e.g. budget, rating, location, capacity).
- Producing a heat map of Barcelona to showcase the concentration of popular wedding venues in an easily digestible form.
- Creating an algorithm that recommends wedding venues personalized for each couple depending on the data such as budget, location preferences and necessary amenities/services related to their special day
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: noces_info_uoc.csv | Column name | Description | |:---------------|:------------------------------------------------------------------------------------------------------------------------| | Spot | Name of the wedding venue. (String) | | Rating | Rating of the wedding venue. (Integer) | | Location | Location of the wedding venue. (String) | | Promotion | Any available promotions for the wedding venue. (String) | | Price | Price for booking along with details about packages if any auxiliary services or rituals are included or not. (Integer) | | Num_people | Maximum number of people that can be accommodated at once. (Integer) | | URL | Link to website containing more information. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Nowadays web portals play an essential role in searching and retrieving information in the several fields of knowledge: they are ever more technologically advanced and designed for supporting the storage of a huge amount of information in natural language originating from the queries launched by users worldwide. A good example is given by the WorldWideScience search engine: The database is available at http://worldwidescience.org/. It is based on a similar gateway, Science.gov, which is the major path to U.S. government science information, as it pulls together Web-based resources from various agencies. The information in the database is intended to be of high quality and authority, as well as the most current available from the participating countries in the Alliance, so users will find that the results will be more refined than those from a general search of Google. It covers the fields of medicine, agriculture, the environment, and energy, as well as basic sciences. Most of the information may be obtained free of charge (the database itself may be used free of charge) and is considered ‘‘open domain.’’ As of this writing, there are about 60 countries participating in WorldWideScience.org, providing access to 50+databases and information portals. Not all content is in English. (Bronson, 2009) Given this scenario, we focused on building a corpus constituted by the query logs registered by the GreyGuide: Repository and Portal to Good Practices and Resources in Grey Literature and received by the WorldWideScience.org (The Global Science Gateway) portal: the aim is to retrieve information related to social media which as of today represent a considerable source of data more and more widely used for research ends. This project includes eight months of query logs registered between July 2017 and February 2018 for a total of 445,827 queries. The analysis mainly concentrates on the semantics of the queries received from the portal clients: it is a process of information retrieval from a rich digital catalogue whose language is dynamic, is evolving and follows – as well as reflects – the cultural changes of our modern society.