8 datasets found
  1. g

    Boundaries: US Zip Codes

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boundaries: US Zip Codes [Dataset]. https://gimi9.com/dataset/data-gov_boundaries-us-zip-codes/
    Explore at:
    Description

    Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.

  2. d

    Wildfire Smoke PM2.5 per Zipcode

    • search.dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene, Kezia; Audirac, Michelle; Spoto, Federica; Childs, Marissa L.; Dominici, Francesca; Braun, Danielle (2024). Wildfire Smoke PM2.5 per Zipcode [Dataset]. http://doi.org/10.7910/DVN/VHNJBD
    Explore at:
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Irene, Kezia; Audirac, Michelle; Spoto, Federica; Childs, Marissa L.; Dominici, Francesca; Braun, Danielle
    Description

    This dataset contains daily aggregated measurements of daily PM2.5 from ambient wildfire smoke in the contiguous United States, spanning from 2006 to 2016. The data is sourced from a study by Childs et al. (2022), titled "Daily Local-Level Estimates of Ambient Wildfire Smoke PM2.5 for the Contiguous US" published in Environmental Science & Technology. To compute the standardized weight across different zip codes, we computed the weight calculation on a 10km grids. To compute the weight, we obtain area-weights that add up to 1 in each polygon of zipcodes. This enabled us to calculate the smoke values per zipcode for the aforementioned period. Those interested in replicating our data processing pipeline can access it at https://github.com/NSAPH-Data-Processing/smoke_aggregation.

  3. Population Dynamics Embeddings

    • kaggle.com
    zip
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ailurophile (2024). Population Dynamics Embeddings [Dataset]. https://www.kaggle.com/datasets/veeralakrishna/population-dynamics-embeddings
    Explore at:
    zip(160984789 bytes)Available download formats
    Dataset updated
    Dec 3, 2024
    Authors
    Ailurophile
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Population Dynamics Foundation Model (PDFM) Embeddings

    PDFM Embeddings are condensed vector representations designed to encapsulate the complex, multidimensional interactions among human behaviors, environmental factors, and local contexts at specific locations. These embeddings capture patterns in aggregated data such as search trends, busyness trends, and environmental conditions (maps, air quality, temperature), providing a rich, location-specific snapshot of how populations engage with their surroundings. Aggregated over space and time, these embeddings ensure privacy while enabling nuanced spatial analysis and prediction for applications ranging from public health to socioeconomic modeling.

    Overview

    PDFM Embeddings are generated using a Graph Neural Network (GNN) model, trained on a rich set of features: - Aggregated Search Trends: Regional interests and concerns reflected in search data. - Aggregated Maps Data: Geospatial and contextual data about locations. - Aggregated Busyness: Activity levels in specific areas, indicating density and frequency of human presence. - Aggregated Weather and Air Quality: Climate-related metrics, including temperature and air quality.

    These features are aggregated at the postal code and county levels to generate localized, context-aware embeddings that preserve privacy.

    Embeddings are available for all counties and ZIP codes within the contiguous United States. For additional coverage, please reach out to pdfm-embeddings@google.com.

    Paper

    For more information on PDFM Embeddings, please see our paper on arXiv.

    Applications

    PDFM Embeddings can be applied to a wide range of geospatial prediction tasks, similar to census and socioeconomic statistics. Example use cases include:

    • Population Health Outcomes: Predicting health statistics like disease prevalence or population health risks.
    • Socioeconomic Factors: Modeling economic indicators and living conditions.
    • Retail: Identifying promising locations for new stores, market expansion, and demand forecasting.
    • Marketing and Sales: Characterizing high-performance regions and identifying similar areas to optimize marketing and sales efforts.

    By incorporating spatial relationships and diverse feature types, these embeddings serve as a powerful tool for geospatial predictions.

    Getting Access to the Embeddings

    Access to Population Dynamics Embeddings is subject to Google’s Terms of Service. Users can download the embeddings and associated files after completing the intake form.

    Using the Embeddings

    Prepare Ground Truth Data

    To use Population Dynamics Embeddings, prepare ground truth data (e.g., target variable for prediction tasks like asthma prevalence) at the postal code or county level.

    Option 1: Incorporate Embeddings into an Existing Model

    1. Prepare Existing Model-Based Ground Truth: Use the embeddings as geospatial covariates to enhance an existing model.
    2. Train an Adapter Model: Improve an existing model by integrating the embeddings.

    Option 2: Tune for Specific Use Cases

    1. Choose a Prediction Model: Any model, such as GBDT, MLP, or linear, can be used for predictions.
    2. Use Embeddings for Prediction: Use PDFM Embeddings as input features, alongside other contextual data, to improve prediction accuracy.

    Demos / Notebooks

    Explore our demo notebooks to understand various use cases of PDFM Embeddings. The code provided is available under the Apache 2.0 license.

    • Nowcasting Colab: Here the model uses past and partial present-day data for a target variable at county level to predict outcomes for remaining counties.
    • Superresolution and Imputation Colab: Here we use the embeddings to help train a model at the county level on a target variable to predict at the zip code level. This model also demonstrates imputation capabilities by training on 20% of zip codes and predicting for the remaining 80%.
    • Forecasting with TimesFM Colab: In this experimental use case, we incorporate TimesFM (a Univariate Forecasting Model) to perform spatiotemporal forecasting. The embeddings are used to adjust for errors in the forecasts and improve their accuracy.
    • **Nighttime Lights Prediction with Earth Engine*...
  4. House prediction for zipcode

    • kaggle.com
    zip
    Updated Jan 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abhi reddy (2019). House prediction for zipcode [Dataset]. https://www.kaggle.com/abhisheikreddy646/house-prediction-for-zipcode
    Explore at:
    zip(1860 bytes)Available download formats
    Dataset updated
    Jan 16, 2019
    Authors
    abhi reddy
    Description

    Context

    House Price Prediction based on city zipcode...

    Content

    A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost.

    Acknowledgements

    “Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

    Inspiration

    Zillow Prize, a competition with a one million dollar grand prize, is challenging the data science community to help push the accuracy of the Zestimate even further. Winning algorithms stand to impact

  5. o

    Data from: ComEd's anonymized AMI energy usage data

    • openenergyhub.ornl.gov
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ComEd's anonymized AMI energy usage data [Dataset]. https://openenergyhub.ornl.gov/explore/dataset/comed-s-anonymized-ami-energy-usage-data/
    Explore at:
    Dataset updated
    Jul 30, 2024
    Description

    One of the key impacts of AMI technology is the availability of interval energy usage data, which can support the development of new products and services and to enable the market to deliver greater value to customers. Requestors can now access anonymized interval energy usage data in 30 minute intervals for all zip codes where AMI meters have been deployed.

  6. d

    Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...

    • datarade.ai
    .json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 232M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    PredictLeads
    Area covered
    El Salvador, Comoros, Virgin Islands (British), Northern Mariana Islands, Bonaire, Kuwait, Bosnia and Herzegovina, Guadeloupe, French Guiana, Kosovo
    Description

    PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

    Key Features:

    ✅232M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

    Primary Attributes:

    • id (string, UUID) – Unique identifier for the job posting.
    • type (string, constant: "job_opening") – Object type.
    • title (string) – Job title.
    • description (string) – Full job description, extracted from the job listing.
    • url (string, URL) – Direct link to the job posting.
    • first_seen_at – Timestamp when the job was first detected.
    • last_seen_at – Timestamp when the job was last detected.
    • last_processed_at – Timestamp when the job data was last processed.

    Job Metadata:

    • contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").
    • categories (array of strings) – Job categories (e.g., "engineering", "marketing").
    • seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").
    • status (string) – Job status (e.g., "open", "closed").
    • language (string) – Language of the job posting.
    • location (string) – Full location details as listed in the job description.
    • Location Data (location_data) (array of objects)
    • city (string, nullable) – City where the job is located.
    • state (string, nullable) – State or region of the job location.
    • zip_code (string, nullable) – Postal/ZIP code.
    • country (string, nullable) – Country where the job is located.
    • region (string, nullable) – Broader geographical region.
    • continent (string, nullable) – Continent name.
    • fuzzy_match (boolean) – Indicates whether the location was inferred.

    Salary Data (salary_data)

    • salary (string) – Salary range extracted from the job listing.
    • salary_low (float, nullable) – Minimum salary in original currency.
    • salary_high (float, nullable) – Maximum salary in original currency.
    • salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").
    • salary_low_usd (float, nullable) – Converted minimum salary in USD.
    • salary_high_usd (float, nullable) – Converted maximum salary in USD.
    • salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

    Occupational Data (onet_data) (object, nullable)

    • code (string, nullable) – ONET occupation code.
    • family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").
    • occupation_name (string, nullable) – Official ONET occupation title.

    Additional Attributes:

    • tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

    📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

    PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset

  7. Restaurants on Yellowpages.com

    • kaggle.com
    zip
    Updated Sep 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2017). Restaurants on Yellowpages.com [Dataset]. https://www.kaggle.com/PromptCloudHQ/restaurants-on-yellowpagescom
    Explore at:
    zip(464849 bytes)Available download formats
    Dataset updated
    Sep 15, 2017
    Dataset authored and provided by
    PromptCloud
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset comprises a comprehensive collection of restaurant listings from Yellowpages.com, capturing essential information such as restaurant names, locations, contact details, cuisine types, and customer ratings. This data is invaluable for those looking to analyze trends in the restaurant industry, perform competitive analysis, or build recommendation systems.

    This dataset has following fields:

    • Url
    • Name
    • Street
    • Zip Code
    • City
    • State
    • Phone
    • Email
    • Website
    • Categories - A comma-delimited (,) list of categories the listing in question falls under. Most listings are placed in multiple categories.

    Whether you’re studying the geographic distribution of restaurants, examining customer preferences, or developing location-based services, this dataset provides a robust foundation for your project. It offers a snapshot of the restaurant landscape as captured from a widely-used business directory, making it a versatile resource for various analytical and development purposes.

    https://www.promptcloud.com/datastock-access-ready-to-use-datasets/?utm_source=yp-kaggle&utm_medium=referral

    For more specific or up-to-date data, or if you need tailored datasets from other platforms, consider leveraging custom web scraping services. PromptCloud offers flexible and scalable data extraction solutions to meet your unique needs, allowing you to focus on analysis and decision-making without worrying about data collection. https://www.promptcloud.com/web-scraping-services/

  8. setfit-0-1-1-zip-package

    • kaggle.com
    zip
    Updated Oct 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paolo Rechia (2022). setfit-0-1-1-zip-package [Dataset]. https://www.kaggle.com/datasets/paolorechia/setfit-0-1-1-zip-package
    Explore at:
    zip(937616396 bytes)Available download formats
    Dataset updated
    Oct 7, 2022
    Authors
    Paolo Rechia
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a dataset I'm using to run an inference notebook using setfit (sentence transformers framework). Framework github: https://github.com/huggingface/setfit

    This process is necessary when you're using setfit on a competition where internet access it not allowed.

    To install setfit package using this dataset, you need the following code (it's very dirty, I didn't bother cleaning it up):

    # To build the package dataset, this was used
    # !pip download setfit --dest "/kaggle/working/setfit-package"
    # zip -r "/kaggle/working/setfit-package.zip" "/kaggle/working/setfit-package"
    
    import shutil
    import os
    print("Copying packages...")
    
    print(os.listdir("/kaggle"))
    try:
      os.mkdir("/kaggle/working")
    except FileExistsError:
      pass
    try:
      os.mkdir("/kaggle/working/packages")
    except FileExistsError:
      pass
    
    print(os.listdir("/kaggle/input/setfit-0-1-1-zip-package/setfit-package/kaggle"))
    try:
      shutil.copytree("/kaggle/input/setfit-0-1-1-zip-package/setfit-package/kaggle/working/setfit-package", "/kaggle/working/packages/setfit-package")
    except FileExistsError:
      pass
    try:
      shutil.copy("/kaggle/input/setfit-0-1-1-zip-package/sentence_transformers-2.2.2-py3-none-any.whl", "/kaggle/working/packages/setfit-package/sentence_transformers-2.2.2-py3-none-any.whl")
    except FileExistsError:
      pass
    print("Copied!")
    
    print(os.listdir("/kaggle/working/packages/setfit-package"))
    !pip install --no-index --find-links=file:///kaggle/working/packages/setfit-package setfit
    

    And then to import the model to run inference: ``` import sys

    Mock the evaluate library, since it brings version compabilitity errors

    And we're not evaluating anything here...

    from unittest.mock import MagicMock sys.modules["evaluate"] = MagicMock()

    Ok, now we need the library installed

    from setfit import SetFitModel

    You should also have weights available somewhere

    I didn't upload the original weights since I'm training locally, e.g., I only upload the final weights to Kaggle

    model_path = "/path/to/your/pretrained_weights" model = SetFitModel.from_pretrained(model_path) ```

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Boundaries: US Zip Codes [Dataset]. https://gimi9.com/dataset/data-gov_boundaries-us-zip-codes/

Boundaries: US Zip Codes

Explore at:
Description

Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.

Search
Clear search
Close search
Google apps
Main menu