8 datasets found

g
Boundaries: US Zip Codes
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boundaries: US Zip Codes [Dataset]. https://gimi9.com/dataset/data-gov_boundaries-us-zip-codes/
Explore at:
Description
Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.
d
Wildfire Smoke PM2.5 per Zipcode
search.dataone.org
dataverse.harvard.edu
+1more
Updated Feb 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene, Kezia; Audirac, Michelle; Spoto, Federica; Childs, Marissa L.; Dominici, Francesca; Braun, Danielle (2024). Wildfire Smoke PM2.5 per Zipcode [Dataset]. http://doi.org/10.7910/DVN/VHNJBD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/VHNJBD
Dataset updated
Feb 9, 2024
Dataset provided by
Harvard Dataverse
Authors
Irene, Kezia; Audirac, Michelle; Spoto, Federica; Childs, Marissa L.; Dominici, Francesca; Braun, Danielle
Description
This dataset contains daily aggregated measurements of daily PM2.5 from ambient wildfire smoke in the contiguous United States, spanning from 2006 to 2016. The data is sourced from a study by Childs et al. (2022), titled "Daily Local-Level Estimates of Ambient Wildfire Smoke PM2.5 for the Contiguous US" published in Environmental Science & Technology. To compute the standardized weight across different zip codes, we computed the weight calculation on a 10km grids. To compute the weight, we obtain area-weights that add up to 1 in each polygon of zipcodes. This enabled us to calculate the smoke values per zipcode for the aforementioned period. Those interested in replicating our data processing pipeline can access it at https://github.com/NSAPH-Data-Processing/smoke_aggregation.
Population Dynamics Embeddings
kaggle.com
zip
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ailurophile (2024). Population Dynamics Embeddings [Dataset]. https://www.kaggle.com/datasets/veeralakrishna/population-dynamics-embeddings
Explore at:
zip(160984789 bytes)Available download formats
Dataset updated
Dec 3, 2024
Authors
Ailurophile
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Population Dynamics Foundation Model (PDFM) Embeddings

PDFM Embeddings are condensed vector representations designed to encapsulate the complex, multidimensional interactions among human behaviors, environmental factors, and local contexts at specific locations. These embeddings capture patterns in aggregated data such as search trends, busyness trends, and environmental conditions (maps, air quality, temperature), providing a rich, location-specific snapshot of how populations engage with their surroundings. Aggregated over space and time, these embeddings ensure privacy while enabling nuanced spatial analysis and prediction for applications ranging from public health to socioeconomic modeling.

Overview

PDFM Embeddings are generated using a Graph Neural Network (GNN) model, trained on a rich set of features: - Aggregated Search Trends: Regional interests and concerns reflected in search data. - Aggregated Maps Data: Geospatial and contextual data about locations. - Aggregated Busyness: Activity levels in specific areas, indicating density and frequency of human presence. - Aggregated Weather and Air Quality: Climate-related metrics, including temperature and air quality.

These features are aggregated at the postal code and county levels to generate localized, context-aware embeddings that preserve privacy.

Embeddings are available for all counties and ZIP codes within the contiguous United States. For additional coverage, please reach out to pdfm-embeddings@google.com.

Paper

For more information on PDFM Embeddings, please see our paper on arXiv.

Applications

PDFM Embeddings can be applied to a wide range of geospatial prediction tasks, similar to census and socioeconomic statistics. Example use cases include:

Population Health Outcomes: Predicting health statistics like disease prevalence or population health risks.

Socioeconomic Factors: Modeling economic indicators and living conditions.

Retail: Identifying promising locations for new stores, market expansion, and demand forecasting.

Marketing and Sales: Characterizing high-performance regions and identifying similar areas to optimize marketing and sales efforts.

By incorporating spatial relationships and diverse feature types, these embeddings serve as a powerful tool for geospatial predictions.

Getting Access to the Embeddings

Access to Population Dynamics Embeddings is subject to Google’s Terms of Service. Users can download the embeddings and associated files after completing the intake form.

Using the Embeddings

Prepare Ground Truth Data

To use Population Dynamics Embeddings, prepare ground truth data (e.g., target variable for prediction tasks like asthma prevalence) at the postal code or county level.

Option 1: Incorporate Embeddings into an Existing Model

Prepare Existing Model-Based Ground Truth: Use the embeddings as geospatial covariates to enhance an existing model.

Train an Adapter Model: Improve an existing model by integrating the embeddings.

Option 2: Tune for Specific Use Cases

Choose a Prediction Model: Any model, such as GBDT, MLP, or linear, can be used for predictions.

Use Embeddings for Prediction: Use PDFM Embeddings as input features, alongside other contextual data, to improve prediction accuracy.

Demos / Notebooks

Explore our demo notebooks to understand various use cases of PDFM Embeddings. The code provided is available under the Apache 2.0 license.

Nowcasting Colab: Here the model uses past and partial present-day data for a target variable at county level to predict outcomes for remaining counties.

Superresolution and Imputation Colab: Here we use the embeddings to help train a model at the county level on a target variable to predict at the zip code level. This model also demonstrates imputation capabilities by training on 20% of zip codes and predicting for the remaining 80%.

Forecasting with TimesFM Colab: In this experimental use case, we incorporate TimesFM (a Univariate Forecasting Model) to perform spatiotemporal forecasting. The embeddings are used to adjust for errors in the forecasts and improve their accuracy.

**Nighttime Lights Prediction with Earth Engine*...
House prediction for zipcode
kaggle.com
zip
Updated Jan 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abhi reddy (2019). House prediction for zipcode [Dataset]. https://www.kaggle.com/abhisheikreddy646/house-prediction-for-zipcode
Explore at:
zip(1860 bytes)Available download formats
Dataset updated
Jan 16, 2019
Authors
abhi reddy
Description
Context

House Price Prediction based on city zipcode...

Content

A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost.

Acknowledgements

“Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

Inspiration

Zillow Prize, a competition with a one million dollar grand prize, is challenging the data science community to help push the accuracy of the Zestimate even further. Winning algorithms stand to impact
o
Data from: ComEd's anonymized AMI energy usage data
openenergyhub.ornl.gov
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). ComEd's anonymized AMI energy usage data [Dataset]. https://openenergyhub.ornl.gov/explore/dataset/comed-s-anonymized-ami-energy-usage-data/
Explore at:
Dataset updated
Jul 30, 2024
Description
One of the key impacts of AMI technology is the availability of interval energy usage data, which can support the development of new products and services and to enable the market to deliver greater value to customers. Requestors can now access anonymized interval energy usage data in 30 minute intervals for all zip codes where AMI meters have been deployed.
d
Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...
datarade.ai
.json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 232M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads
Explore at:
.jsonAvailable download formats
Dataset authored and provided by
PredictLeads
Area covered
El Salvador, Comoros, Virgin Islands (British), Northern Mariana Islands, Bonaire, Kuwait, Bosnia and Herzegovina, Guadeloupe, French Guiana, Kosovo
Description
PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

Key Features:

✅232M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

Primary Attributes:

id (string, UUID) – Unique identifier for the job posting.

type (string, constant: "job_opening") – Object type.

title (string) – Job title.

description (string) – Full job description, extracted from the job listing.

url (string, URL) – Direct link to the job posting.

first_seen_at – Timestamp when the job was first detected.

last_seen_at – Timestamp when the job was last detected.

last_processed_at – Timestamp when the job data was last processed.

Job Metadata:

contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").

categories (array of strings) – Job categories (e.g., "engineering", "marketing").

seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").

status (string) – Job status (e.g., "open", "closed").

language (string) – Language of the job posting.

location (string) – Full location details as listed in the job description.

Location Data (location_data) (array of objects)

city (string, nullable) – City where the job is located.

state (string, nullable) – State or region of the job location.

zip_code (string, nullable) – Postal/ZIP code.

country (string, nullable) – Country where the job is located.

region (string, nullable) – Broader geographical region.

continent (string, nullable) – Continent name.

fuzzy_match (boolean) – Indicates whether the location was inferred.

Salary Data (salary_data)

salary (string) – Salary range extracted from the job listing.

salary_low (float, nullable) – Minimum salary in original currency.

salary_high (float, nullable) – Maximum salary in original currency.

salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").

salary_low_usd (float, nullable) – Converted minimum salary in USD.

salary_high_usd (float, nullable) – Converted maximum salary in USD.

salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

Occupational Data (onet_data) (object, nullable)

code (string, nullable) – ONET occupation code.

family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").

occupation_name (string, nullable) – Official ONET occupation title.

Additional Attributes:

tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset
Restaurants on Yellowpages.com
kaggle.com
zip
Updated Sep 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PromptCloud (2017). Restaurants on Yellowpages.com [Dataset]. https://www.kaggle.com/PromptCloudHQ/restaurants-on-yellowpagescom
Explore at:
zip(464849 bytes)Available download formats
Dataset updated
Sep 15, 2017
Dataset authored and provided by
PromptCloud
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset comprises a comprehensive collection of restaurant listings from Yellowpages.com, capturing essential information such as restaurant names, locations, contact details, cuisine types, and customer ratings. This data is invaluable for those looking to analyze trends in the restaurant industry, perform competitive analysis, or build recommendation systems.

This dataset has following fields:

Url

Name

Street

Zip Code

City

State

Phone

Email

Website

Categories - A comma-delimited (,) list of categories the listing in question falls under. Most listings are placed in multiple categories.

Whether you’re studying the geographic distribution of restaurants, examining customer preferences, or developing location-based services, this dataset provides a robust foundation for your project. It offers a snapshot of the restaurant landscape as captured from a widely-used business directory, making it a versatile resource for various analytical and development purposes.

https://www.promptcloud.com/datastock-access-ready-to-use-datasets/?utm_source=yp-kaggle&utm_medium=referral

For more specific or up-to-date data, or if you need tailored datasets from other platforms, consider leveraging custom web scraping services. PromptCloud offers flexible and scalable data extraction solutions to meet your unique needs, allowing you to focus on analysis and decision-making without worrying about data collection. https://www.promptcloud.com/web-scraping-services/
setfit-0-1-1-zip-package
kaggle.com
zip
Updated Oct 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Rechia (2022). setfit-0-1-1-zip-package [Dataset]. https://www.kaggle.com/datasets/paolorechia/setfit-0-1-1-zip-package
Explore at:
zip(937616396 bytes)Available download formats
Dataset updated
Oct 7, 2022
Authors
Paolo Rechia
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a dataset I'm using to run an inference notebook using setfit (sentence transformers framework). Framework github: https://github.com/huggingface/setfit

This process is necessary when you're using setfit on a competition where internet access it not allowed.

To install setfit package using this dataset, you need the following code (it's very dirty, I didn't bother cleaning it up):

# To build the package dataset, this was used # !pip download setfit --dest "/kaggle/working/setfit-package" # zip -r "/kaggle/working/setfit-package.zip" "/kaggle/working/setfit-package" import shutil import os print("Copying packages...") print(os.listdir("/kaggle")) try: os.mkdir("/kaggle/working") except FileExistsError: pass try: os.mkdir("/kaggle/working/packages") except FileExistsError: pass print(os.listdir("/kaggle/input/setfit-0-1-1-zip-package/setfit-package/kaggle")) try: shutil.copytree("/kaggle/input/setfit-0-1-1-zip-package/setfit-package/kaggle/working/setfit-package", "/kaggle/working/packages/setfit-package") except FileExistsError: pass try: shutil.copy("/kaggle/input/setfit-0-1-1-zip-package/sentence_transformers-2.2.2-py3-none-any.whl", "/kaggle/working/packages/setfit-package/sentence_transformers-2.2.2-py3-none-any.whl") except FileExistsError: pass print("Copied!") print(os.listdir("/kaggle/working/packages/setfit-package")) !pip install --no-index --find-links=file:///kaggle/working/packages/setfit-package setfit

And then to import the model to run inference: ``` import sys

Mock the evaluate library, since it brings version compabilitity errors

And we're not evaluating anything here...

from unittest.mock import MagicMock sys.modules["evaluate"] = MagicMock()

Ok, now we need the library installed

from setfit import SetFitModel

You should also have weights available somewhere

I didn't upload the original weights since I'm training locally, e.g., I only upload the final weights to Kaggle

model_path = "/path/to/your/pretrained_weights" model = SetFitModel.from_pretrained(model_path) ```
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Boundaries: US Zip Codes [Dataset]. https://gimi9.com/dataset/data-gov_boundaries-us-zip-codes/

Boundaries: US Zip Codes

Explore at:

Description

Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.

Clear search

Close search

Google apps

Main menu

Boundaries: US Zip Codes

Wildfire Smoke PM2.5 per Zipcode

Population Dynamics Embeddings

Population Dynamics Foundation Model (PDFM) Embeddings

Overview

Paper

Applications

Getting Access to the Embeddings

Using the Embeddings

Prepare Ground Truth Data

Option 1: Incorporate Embeddings into an Existing Model

Option 2: Tune for Specific Use Cases

Demos / Notebooks

House prediction for zipcode

Context

Content

Acknowledgements

Inspiration

Data from: ComEd's anonymized AMI energy usage data

Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...

Restaurants on Yellowpages.com

setfit-0-1-1-zip-package

Mock the evaluate library, since it brings version compabilitity errors

And we're not evaluating anything here...

Ok, now we need the library installed

You should also have weights available somewhere

I didn't upload the original weights since I'm training locally, e.g., I only upload the final weights to Kaggle

Boundaries: US Zip CodesSee More Versions

Boundaries: US Zip Codes