Facebook
TwitterAustin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.
Facebook
TwitterThis dataset contains daily aggregated measurements of daily PM2.5 from ambient wildfire smoke in the contiguous United States, spanning from 2006 to 2016. The data is sourced from a study by Childs et al. (2022), titled "Daily Local-Level Estimates of Ambient Wildfire Smoke PM2.5 for the Contiguous US" published in Environmental Science & Technology. To compute the standardized weight across different zip codes, we computed the weight calculation on a 10km grids. To compute the weight, we obtain area-weights that add up to 1 in each polygon of zipcodes. This enabled us to calculate the smoke values per zipcode for the aforementioned period. Those interested in replicating our data processing pipeline can access it at https://github.com/NSAPH-Data-Processing/smoke_aggregation.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
PDFM Embeddings are condensed vector representations designed to encapsulate the complex, multidimensional interactions among human behaviors, environmental factors, and local contexts at specific locations. These embeddings capture patterns in aggregated data such as search trends, busyness trends, and environmental conditions (maps, air quality, temperature), providing a rich, location-specific snapshot of how populations engage with their surroundings. Aggregated over space and time, these embeddings ensure privacy while enabling nuanced spatial analysis and prediction for applications ranging from public health to socioeconomic modeling.
PDFM Embeddings are generated using a Graph Neural Network (GNN) model, trained on a rich set of features: - Aggregated Search Trends: Regional interests and concerns reflected in search data. - Aggregated Maps Data: Geospatial and contextual data about locations. - Aggregated Busyness: Activity levels in specific areas, indicating density and frequency of human presence. - Aggregated Weather and Air Quality: Climate-related metrics, including temperature and air quality.
These features are aggregated at the postal code and county levels to generate localized, context-aware embeddings that preserve privacy.
Embeddings are available for all counties and ZIP codes within the contiguous United States. For additional coverage, please reach out to pdfm-embeddings@google.com.
For more information on PDFM Embeddings, please see our paper on arXiv.
PDFM Embeddings can be applied to a wide range of geospatial prediction tasks, similar to census and socioeconomic statistics. Example use cases include:
By incorporating spatial relationships and diverse feature types, these embeddings serve as a powerful tool for geospatial predictions.
Access to Population Dynamics Embeddings is subject to Google’s Terms of Service. Users can download the embeddings and associated files after completing the intake form.
To use Population Dynamics Embeddings, prepare ground truth data (e.g., target variable for prediction tasks like asthma prevalence) at the postal code or county level.
Explore our demo notebooks to understand various use cases of PDFM Embeddings. The code provided is available under the Apache 2.0 license.
Facebook
TwitterHouse Price Prediction based on city zipcode...
A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost.
“Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.
Zillow Prize, a competition with a one million dollar grand prize, is challenging the data science community to help push the accuracy of the Zestimate even further. Winning algorithms stand to impact
Facebook
TwitterOne of the key impacts of AMI technology is the availability of interval energy usage data, which can support the development of new products and services and to enable the market to deliver greater value to customers. Requestors can now access anonymized interval energy usage data in 30 minute intervals for all zip codes where AMI meters have been deployed.
Facebook
TwitterPredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.
Key Features:
✅232M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.
Primary Attributes:
Job Metadata:
Salary Data (salary_data)
Occupational Data (onet_data) (object, nullable)
Additional Attributes:
📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.
PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset comprises a comprehensive collection of restaurant listings from Yellowpages.com, capturing essential information such as restaurant names, locations, contact details, cuisine types, and customer ratings. This data is invaluable for those looking to analyze trends in the restaurant industry, perform competitive analysis, or build recommendation systems.
This dataset has following fields:
UrlNameStreetZip CodeCityStatePhoneEmailWebsiteCategories - A comma-delimited (,) list of categories the listing in question falls under. Most listings are placed in multiple categories.Whether you’re studying the geographic distribution of restaurants, examining customer preferences, or developing location-based services, this dataset provides a robust foundation for your project. It offers a snapshot of the restaurant landscape as captured from a widely-used business directory, making it a versatile resource for various analytical and development purposes.
For more specific or up-to-date data, or if you need tailored datasets from other platforms, consider leveraging custom web scraping services. PromptCloud offers flexible and scalable data extraction solutions to meet your unique needs, allowing you to focus on analysis and decision-making without worrying about data collection. https://www.promptcloud.com/web-scraping-services/
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset I'm using to run an inference notebook using setfit (sentence transformers framework). Framework github: https://github.com/huggingface/setfit
This process is necessary when you're using setfit on a competition where internet access it not allowed.
To install setfit package using this dataset, you need the following code (it's very dirty, I didn't bother cleaning it up):
# To build the package dataset, this was used
# !pip download setfit --dest "/kaggle/working/setfit-package"
# zip -r "/kaggle/working/setfit-package.zip" "/kaggle/working/setfit-package"
import shutil
import os
print("Copying packages...")
print(os.listdir("/kaggle"))
try:
os.mkdir("/kaggle/working")
except FileExistsError:
pass
try:
os.mkdir("/kaggle/working/packages")
except FileExistsError:
pass
print(os.listdir("/kaggle/input/setfit-0-1-1-zip-package/setfit-package/kaggle"))
try:
shutil.copytree("/kaggle/input/setfit-0-1-1-zip-package/setfit-package/kaggle/working/setfit-package", "/kaggle/working/packages/setfit-package")
except FileExistsError:
pass
try:
shutil.copy("/kaggle/input/setfit-0-1-1-zip-package/sentence_transformers-2.2.2-py3-none-any.whl", "/kaggle/working/packages/setfit-package/sentence_transformers-2.2.2-py3-none-any.whl")
except FileExistsError:
pass
print("Copied!")
print(os.listdir("/kaggle/working/packages/setfit-package"))
!pip install --no-index --find-links=file:///kaggle/working/packages/setfit-package setfit
And then to import the model to run inference: ``` import sys
from unittest.mock import MagicMock sys.modules["evaluate"] = MagicMock()
from setfit import SetFitModel
model_path = "/path/to/your/pretrained_weights" model = SetFitModel.from_pretrained(model_path) ```
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAustin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.