11 datasets found
  1. d

    Full US Phone Number and Telecom Data | 387,543,864 Phones | Full USA...

    • datarade.ai
    .json, .csv, .xls
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompCurve (2023). Full US Phone Number and Telecom Data | 387,543,864 Phones | Full USA Coverage | Mobile and Landline with Carrier | 100% Verifiable Data [Dataset]. https://datarade.ai/data-products/full-us-phone-number-and-telecom-data-387-543-864-phones-compcurve
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Aug 12, 2023
    Dataset authored and provided by
    CompCurve
    Area covered
    United States
    Description

    This comprehensive dataset delivers 387M+ U.S. phone numbers enriched with deep telecom intelligence and granular geographic metadata, providing one of the most complete national phone data assets available today. Designed for data enrichment, verification, identity resolution, analytics, risk modeling, telecom research, and large-scale customer intelligence, this file combines broad coverage with highly structured attributes and reliable carrier-grade metadata. It is a powerful resource for any organization that needs accurate, up-to-date U.S. phone number data supported by robust telecom identifiers.

    Our dataset includes mobile, landline, and VOIP numbers, paired with detailed fields such as carrier, line type, city, state, ZIP code, county, latitude/longitude, time zone, rate center, LATA, and OCN. These attributes make the file suitable for a wide range of applications, from consumer analytics and segmentation to identity graph construction and marketing audience modeling. Updated regularly and validated for completeness, this dataset offers high-confidence coverage across all 50 states, major metros, rural areas, and underserved regions.

    Field Coverage & Schema Overview

    The dataset contains a rich set of fields commonly required for telecom analysis, identity resolution, and large-scale data cleansing:

    Phone Number – Standardized 10-digit U.S. number

    Line Type – Wireless, Landline, VOIP, fixed-wireless, etc.

    Carrier / Provider – Underlying or current carrier assignment

    City & State – Parsed from rate center and location metadata

    ZIP Code – Primary ZIP associated with the phone block

    County – County name mapped to geographic area

    Latitude / Longitude – Approximate geo centroid for the assigned location

    Time Zone – Automatically mapped; useful for outbound compliance

    Rate Center – Telco rate center tied to number blocks

    LATA – Local Access and Transport Area for telecom routing

    OCN (Operating Company Number) – Carrier identifier for precision analytics

    Additional metadata such as region codes, telecom identifiers, and national routing attributes depending on the number block

    These data points provide a complete snapshot of the phone number’s telecom context and geographic footprint.

    Key Features

    387M+ fully structured U.S. phone numbers

    Mobile, landline, and VOIP line types

    Accurate carrier and OCN information

    Geo-enriched records with city, state, ZIP, county, lat/long

    Telecom routing metadata including rate center and LATA

    Ideal for large-scale analytics, enrichment, and modeling

    Nationwide coverage with consistent formatting and schema

    Primary Use Cases 1. Data Enrichment & Appending

    Enhance customer databases by adding carrier information, line type, geographic attributes, and telecom routing fields to improve downstream analytics and segmentation.

    1. Identity Resolution & Profile Matching

    Use carrier, OCN, and geographic fields to strengthen your identity graph, resolve duplicate entities, confirm telephone types, or enrich cross-channel identifiers.

    1. Lead Scoring & Consumer Modeling

    Build predictive models based on:

    Line type (mobile vs landline)

    Geography (state, county, ZIP)

    Telecom infrastructure and regional carrier assignments Useful for ML/AI scoring, propensity models, risk analysis, and customer lifetime value studies.

    1. Compliance-Aware Outreach Planning

    Fields like time zone, rate center, and line type support compliant outbound operations, call scheduling, and segmentation of mobile vs landline users for regulated environments.

    1. Data Quality, Cleansing & Validation

    Normalize customer files, detect outdated or mismatched phone metadata, resolve carrier inconsistencies, and remove non-U.S. or structurally invalid numbers.

    1. Telecom Market Analysis

    Researchers and telecom analysts can use the dataset to understand national carrier distribution, regional line-type patterns, infrastructure growth, and switching behavior.

    1. Fraud Detection & Risk Intelligence

    Carrier metadata, OCN patterns, and geographic context support:

    Synthetic identity detection

    Fraud scoring models

    Device/number reputation systems

    VOIP risk modeling

    1. Location-Based Analytics & Mapping

    Lat/long and geographic context fields allow integration into GIS systems, heat-mapping, regional modeling, and ZIP- or county-level segmentation.

    1. Customer Acquisition & Audience Building

    Build highly targeted audiences for:

    Marketing analytics

    Look-alike modeling

    Cross-channel segmentation

    Regional consumer insights

    1. Enterprise-Scale ETL & Data Infrastructure

    The structured, normalized schema makes this file easy to integrate into:

    Data lakes

    Snowflake / BigQuery warehouses

    ID graphs

    Customer 360 platforms

    Telecom research systems

    Ideal Users

    Marketing analytics teams

    Data science groups

    Identity resolution providers

    Fraud & risk intelligence platforms

    Telecom analysts

    Consumer data platforms

    Credit, insurance, and fintech modeling teams

    Data brokers & a...

  2. Connecticut Residential Real Estate 2011-2021

    • kaggle.com
    zip
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asa Sherwyn (2023). Connecticut Residential Real Estate 2011-2021 [Dataset]. https://www.kaggle.com/datasets/asasherwyn/ctrre-2011-2021
    Explore at:
    zip(14354051 bytes)Available download formats
    Dataset updated
    Jan 31, 2023
    Authors
    Asa Sherwyn
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Connecticut
    Description

    All data compiled into this dataset is available under public domain. This set is designed to provide some insight into sales trends across the state of Connecticut as well as the individual towns within. It is also specifically structured to highlight changes in trends due to the COVID-19 pandemic.

    Variables

    list_year: grand list year of the property (grand list years run from Oct. 1 through Sept. 30). town: name of the town that the property was sold in. population: population of the town that the property was sold in. residential_type: single family, two family, three family, four family, or condo. month: the month the sale was recorded. year: the year the sale was recorded. in_pandemic: boolean value indicating whether the selling date was after March 11, 2020. assessed_value: tax assessed value of the property at the time of the sale. sale_amount: final closing sale amount of the property. price_index: the Consumer Price Index (CPI) for that month/year. Used to normalize dollar values. norm_assessed_value: CPI-normalized assessed value (assessed_value / price_index * 100). norm_sale_amount: CPI-normalized sale amount (sale_amount / price_index * 100). norm_sales_ratio: CPI-normalized assessment to sale ratio (norm_assessed_value / norm_sale_amount). latitude: latitude for the property's town. longitude: longitude for the property's town.

    Note: the original dataset also contained the street address and exact sale date for each record. Those variables were removed as they were not relevant to the analysis being conducted and to afford the individuals associated with each sale a stonger degree of personal privacy. Records from October 2000 to October 2010 from the original dataset were omitted due to timeliness issues. Records of non-residential types were omitted as they lacked enough historic records to be of consequence to the analysis.

    Data sources

    Real estate records: https://data.ct.gov/Housing-and-Development/Real-Estate-Sales-2001-2020-GL/5mzw-sjtu Township shapes: https://data.ct.gov/Government/Town-Boundary-Index-Map/evyv-fqzg Consumer price index: https://www.bls.gov/regions/new-england/data/consumerpriceindex_us_table.htm Town populations: https://www.connecticut-demographics.com/cities_by_population

  3. m

    Data for: Trends, Reversion, and Critical Phenomena in Financial Markets

    • data.mendeley.com
    Updated Dec 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christof Schmidhuber (2020). Data for: Trends, Reversion, and Critical Phenomena in Financial Markets [Dataset]. http://doi.org/10.17632/v73nzdt7rt.1
    Explore at:
    Dataset updated
    Dec 11, 2020
    Authors
    Christof Schmidhuber
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    These data accompany the publication "Trends, Reversion, and Critical Phenomena in Financial Markets".

    They contain daily data from Jan 1992 to Dec 2019 on 24 financial markets, namely

    • 6 equity indices: S&P 500, TSE 60, DAX 30, FTSE 100, Nikkei 225, Hang Seng
    • 6 Interest rates for government bonds: US 10-year, Canada 10-year, Germany 10-year, UK 10-year, Japan 10-year, Australia 3-year
    • 6 FX rates: CAD/USD, EUR/USD, GBP/USD, JPY/USD, AUD/USD, NZD/USD
    • 6 Commodities: Crude Oil, Natural Gas, Gold, Copper, Soybeans, Live Cattle

    The data are provided in 13 columns:

    • Column 1: date
    • Column 2: market
    • Column 3: daily log return of futures on that market, normalized to have mean 0 and standard deviation 1 over the 28-year time period
    • Columns 4-13: trend strengths in that market over 10 different time horizons of (2,4,8,16,32,64,128,256,512,1024) business days.

    The trend strengths are defined in the accompanying paper. They are cut off at plus/minus 2.5. The daily log returns were computed from daily futures prices, rolled 5 days prior to first notice, which were taken from Bloomberg. The following mean returns and volatilites were used to normalize the daily log returns in column 3:

    Market Mean St. Dev.

    S&P 500 2.217% 1.100% TSE 60 2.416% 1.067% DAX 30 1.199% 1.366% FTSE 100 1.053% 1.103% Nikkei 225 -0.483% 1.486% Hang Seng 0.768% 1.674% US 10-year 3.734% 0.366% Can. 10-year 3.637% 0.376% Ger. 10-year 4.141% 0.337% UK 10-year 2.983% 0.419% Jap. 10-year 4.453% 0.249% Aus. 3-year 3.029% 0.074% CAD/USD 0.048% 0.479% EUR/USD -0.222% 0.619% GBP/USD 0.316% 0.597% JPY/USD -0.761% 0.667% AUD/USD 0.851% 0.725% NZD/USD 1.563% 0.724% Crude Oil 0.093% 2.243% Natural Gas -2.649% 2.985% Gold 0.580% 0.987% Copper 0.936% 1.586% Soybeans 0.631% 1.360% Live Cattle 0.483% 0.894%

  4. d

    Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Total Inorganic Nitrogen [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Contiguous United States, United States
    Description

    This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Total Inorganic Nitrogen for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of Total Inorganic Nitrogen deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.

  5. d

    Data from: Attributes for NHDPlus Catchments (Version 1.1) for the...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Ammonium (NH4) [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize-dafbc
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States, Contiguous United States
    Description

    This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Ammonium (NH4) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NH4 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.

  6. d

    Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Nitrate (NO3) [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize-781ec
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States, Contiguous United States
    Description

    This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Nitrate (NO3) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NO3 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.

  7. Student Academic Performance (Synthetic Dataset)

    • kaggle.com
    zip
    Updated Oct 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mamun Hasan (2025). Student Academic Performance (Synthetic Dataset) [Dataset]. https://www.kaggle.com/datasets/mamunhasan2cs/student-academic-performance-synthetic-dataset
    Explore at:
    zip(9287 bytes)Available download formats
    Dataset updated
    Oct 10, 2025
    Authors
    Mamun Hasan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a synthetic collection of student performance data created for data preprocessing, cleaning, and analysis practice in Data Mining and Machine Learning courses. It contains information about 1,020 students, including their study habits, attendance, and test performance, with intentionally introduced missing values, duplicates, and outliers to simulate real-world data issues.

    The dataset is suitable for laboratory exercises, assignments, and demonstration of key preprocessing techniques such as:

    • Handling missing values
    • Removing duplicates
    • Detecting and treating outliers
    • Data normalization and transformation
    • Encoding categorical variables
    • Exploratory data analysis (EDA)
    • Regression Analysis

    šŸ“Š Columns Description

    Column NameDescription
    Student_IDUnique identifier for each student (e.g., S0001, S0002, …)
    AgeAge of the student (between 18 and 25 years)
    GenderGender of the student (Male/Female)
    Study_HoursAverage number of study hours per day (contains missing values and outliers)
    Attendance(%)Percentage of class attendance (contains missing values)
    Test_ScoreFinal exam score (0–100 scale)
    GradeLetter grade derived from test scores (F, C, B, A, A+)

    🧠 Example Lab Tasks Using This Dataset:

    • Identify and impute missing values using mean/median.
    • Detect and remove duplicate records.
    • Use IQR or Z-score methods to handle outliers.
    • Normalize Study_Hours and Test_Score using Min-Max scaling.
    • Encode categorical variables (Gender, Grade) for model input.
    • Prepare a clean dataset ready for classification/regression analysis.
    • Can be used for Limited Regression

    šŸŽÆ Possible Regression Targets

    Test_Score → Predict test score based on study hours, attendance, age, and gender.

    🧩 Example Regression Problem

    Predict the student’s test score using their study hours, attendance percentage, and age.

    🧠 Sample Features: X = ['Age', 'Gender', 'Study_Hours', 'Attendance(%)'] y = ['Test_Score']

    You can use:

    • Linear Regression (for simplicity)
    • Polynomial Regression (to explore nonlinear patterns)
    • Decision Tree Regressor or Random Forest Regressor

    And analyze feature influence using correlation or SHAP/LIME explainability.

  8. DDSP EMG dataset.xlsx

    • commons.datacite.org
    • figshare.com
    Updated Jul 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marta Cercone (2019). DDSP EMG dataset.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.8864411
    Explore at:
    Dataset updated
    Jul 14, 2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Figsharehttp://figshare.com/
    Authors
    Marta Cercone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study was performed in accordance with the PHS Policy on Humane Care and Use of Laboratory Animals, federal and state regulations, and was approved by the Institutional Animal Care and Use Committees (IACUC) of Cornell University and the Ethics and Welfare Committee at the Royal Veterinary College.Study design: adult horses were recruited if in good health and following evaluation of the upper airways through endoscopic exam, at rest and during exercise, either overground or on a high-speed treadmill using a wireless videoendoscope. Horses were categorized as ā€œDDSPā€ affected horses if they presented with exercise-induced intermittent dorsal displacement of the soft palate consistently during multiple (n=3) exercise tests, or ā€œcontrolā€ horses if they did not experience dorsal displacement of the soft palate during exercise and had no signs compatible with DDSP like palatal instability during exercise, soft palate or sub-epiglottic ulcerations. Horses were instrumented with intramuscular electrodes, in one or both thyro-hyoid muscles for EMG recording, hard wired to a wireless transmitter for remote recording implanted in the cervical area. EMG recordings were then made during an incremental exercise test based on the percentage of maximum heart rate (HRmax). Incremental Exercise Test After surgical instrumentation, each horse performed a 4-step incremental test while recording TH electromyographic activity, heart rate, upper airway videoendoscopy, pharyngeal airway pressures, and gait frequency measurements. Horses were evaluated at exercise intensities corresponding to 50, 80, 90 and 100% of their maximum heart rate with each speed maintained for 1 minute. aryngeal function during the incremental test was recorded using a wireless videoendoscope (Optomed, Les Ulis, France), which was placed into the nasopharynx via the right ventral nasal meatus. Nasopharyngeal pressure was measured using a Teflon catheter (1.3 mm ID, Neoflon) inserted through the left ventral nasal meatus to the level of the left guttural pouch ostium. The catheter was attached to differential pressure transducers (Celesco LCVR, Celesco Transducers Products, Canoga Park, CA, USA) referenced to atmospheric pressure and calibrated from -70 to 70 mmHg. Occurrence of episodes of dorsal displacement of the soft palate was recorded and number of swallows during each exercise trials were counted for each speed interval.
    EMG recordingEMG data was recorded through a wireless transmitter device implanted subcutaneously. Two different transmitters were used: 1) TR70BB (Telemetry Research Ltd, Auckland, New Zealand) with 12bit A/D conversion resolution, AC coupled amplifier, -3dB point at 1.5Hz, 2KHz sampling frequency (n=5 horses); or 2) ELI (Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria) [23], with 12bit A/D conversion resolution, AC coupled amplifier, amplifier gain 1450, 1KHz sampling frequency (n=4 horses). The EMG signal was transmitted through a receiver (TR70BB) or Bluetooth (ELI) to a data acquisition system (PowerLab 16/30 - ML880/P, ADInstruments, Bella Vista, Australia). The EMG signal was amplified with octal bio-amplifier (Octal Bioamp, ML138, ADInstruments, Bella Vista, Australia) with a bandwidth frequency ranging from 20-1000 Hz (input impedance = 200 MV, common mode rejection ratio = 85 dB, gain = 1000), and transmitted to a personal computer. All EMG and pharyngeal pressure signals were collected at 2000 Hz rate with LabChart 6 software (ADInstruments, Bella Vista, Australia) that allows for real-time monitoring and storage for post-processing and analysis.
    EMG signal processingElectromyographic signals from the TH muscles were processed using two methods; 1) a classical approach to myoelectrical activity and median frequency and 2) wavelet decomposition. For both methods, the beginning and end of recording segments including twenty consecutive breaths, at the end of each speed interval, were marked with comments in the acquisition software (LabChart). The relationship of EMG activity with phase of the respiratory cycle was determined by comparing pharyngeal pressure waveforms with the raw EMG and time-averaged EMG traces.For the classical approach, in a graphical user interface-based software (LabChart), a sixth-order Butterworth filter was applied (common mode rejection ratio, 90 dB; band pass, 20 to 1,000 Hz), the EMG signal was then amplified, full-wave rectified, and smoothed using a triangular Bartlett window (time constant: 150ms). The digitized area under the time-averaged full-wave rectified EMG signal was calculated to define the raw mean electrical activity (MEA) in mV.s. Median Power Frequency (MF) of the EMG power spectrum was calculated after a Fast Fourier Transformation (1024 points, Hann cosine window processing). For the wavelet decomposition, the whole dataset including comments and comment locations was exported as .mat files for processing in MATLAB R2018a with the Signal Processing Toolbox (The MathWorks Inc, Natick, MA, USA). A custom written automated script based on Hodson-Tole & Wakeling [24] was used to first cut the .mat file into the selected 20 breath segments and subsequently process each segment. A bank of 16 wavelets with time and frequency resolution optimized for EMG was used. The center frequencies of the bank ranged from 6.9 Hz to 804.2 Hz [25]. The intensity was summed (mV2) to a total, and the intensity contribution of each wavelet was calculated across all 20 breaths for each horse, with separate results for each trial date and exercise level (80, 90, 100% of HRmax as well as the period preceding episodes of DDSP). To determine the relevant bandwidths for the analysis, a Fast Fourier transform frequency analysis was performed on the horses unaffected by DDSP from 0 to 1000 Hz in increments of 50Hz and the contribution of each interval was calculated in percent of total spectrum as median and interquartile range. According to the Shannon-Nyquist sampling theorem, the relevant signal is below ½ the sample rate and because we had instrumentation sampling either 1000Hz and 2000Hz we choose to perform the frequency analysis up to 1000Hz. The 0-50Hz interval, mostly stride frequency and background noise, was excluded from further analysis. Of the remaining frequency spectrum, we included all intervals from 50-100Hz to 450-500Hz and excluded the remainder because they contributed with less than 5% to the total amplitude.Data analysisAt the end of each exercise speed interval, twenty consecutive breaths were selected and analyzed as described above. To standardize MEA, MF and mV2 within and between horses and trials, and to control for different electrodes size (i.e. different impedance and area of sampling), data were afterward normalized to 80% of HRmax value (HRmax80), referred to as normalized MEA (nMEA), normalized MF (nMF) and normalized mV2 (nmV2). During the initial processing, it became clear that the TH muscle is inconsistently activated at 50% of HRmax and that speed level was therefore excluded from further analysis. The endoscopy video was reviewed and episodes of palatal displacement were marked with comments. For both the classical approach and wavelet analysis, an EMG segment preceding and concurrent to the DDSP episode was analyzed. If multiple episodes were recorded during the same trial, only the period preceding the first palatal displacement was analyzed. In horses that had both TH muscles implanted, the average between the two sides was used for the analysis. Averaged data from multiple trials were considered for each horse. Descriptive data are expressed as means with standard deviation (SD). Normal distribution of data was assessed using the Kolmogorov-Smirnov test and quantile-quantile (Q-Q) plot. To determine the frequency clusters in the EMG signal, a hierarchical agglomerative dendrogram was applied using the packages Matplotlib, pandas, numpy and scipy in python (version 3.6.6) executed through Spyder (version 3.2.2) and Anaconda Navigator. Based on the frequency analysis, wavelets included in the cluster analysis were 92.4 Hz, 128.5 Hz, 170.4 Hz, 218.1 Hz, 271.5 Hz, 330.6 Hz, 395.4 Hz and 465.9 Hz. The number of frequency clusters was set to two based on maximum acceleration in a scree plot and maximum vertical distance in the dendrogram. For continuous outcome measures (number of swallows, MEA, MF, and mV2) a mixed effect model was fitted to the data to determine the relationship between the outcome variable and relevant fixed effects (breed, sex, age, weight, speed, group) using horse as a random effect. Tukey’s post hoc tests and linear contrasts used as appropriate. Statistical analysis was performed using JMP Pro13 (SAS Institute, Cary, NC, USA). Significance set at P < 0.05 throughout.

  9. Global Hospital Beds Capacity (for covid-19)

    • kaggle.com
    zip
    Updated Apr 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Kiulian (2020). Global Hospital Beds Capacity (for covid-19) [Dataset]. https://www.kaggle.com/ikiulian/global-hospital-beds-capacity-for-covid19
    Explore at:
    zip(290457 bytes)Available download formats
    Dataset updated
    Apr 26, 2020
    Authors
    Igor Kiulian
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DISCLAIMER

    Dataset consists of historical data of pre-pandemic period and doesn’t represent the current reality which may have changed due to the spikes in demand. This dataset has been generated in collaboration of efforts within CoronaWhy community.

    Context

    Last updated: April 26th 2020 Updates: April 14th 2020 - Added missing population data April 15th 2020 - Added Brazil statewise ICU hospital beds dataset April 21th 2020 - Added Italy, Spain statewise ICU hospital beds dataset, India statewise TOTAL hospital beds dataset April 26th 2020 - Added Sweden ICU(2019) and TOTAL(2018) beds datasets

    Purpose of the dataset

    I am trying to produce a dataset that will provide a foundation for policymakers to understand the realistic capacity of healthcare providers being able to deal with the spikes in demand for intensive care. As a way to help, I’ve prepared a dataset of beds across countries and states. Work in progress dataset that should and will be updated as more data becomes available and public on weekly basis.

    Importance

    This dataset is intended to be used as a baseline for understanding the typical bed capacity and coverage globally. This information is critical for understanding the impact of a high utilization event, like COVID-19.

    Current challenges

    Datasets are scattered across the web and are very hard to normalize, I did my best but help would be much appreciated.

    Data sources / Acknowledgments

    arcgis (USA) - https://services1.arcgis.com/Hp6G80Pky0om7QvQ/arcgis/rest/services/Hospitals_1/FeatureServer/0 KHN (USA) - https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/ datahub.io (World) - https://datahub.io/world-bank/sh.med.beds.zs eurostat - https://data.europa.eu/euodp/en/data/dataset/vswUL3c6yKoyahrvIRyew OECD - https://data.oecd.org/healtheqt/hospital-beds.htm WDI (World) - https://data.worldbank.org/indicator/SH.MED.BEDS.ZS NHP(India) - http://www.cbhidghs.nic.in/showfile.php?lid=1147 data.gov.sg (Singapore) - https://data.gov.sg/dataset/health-facilities?view_id=91b4feed-dcb9-4720-8cb0-ac2f04b7efd0&resource_id=dee5ccce-4dfb-467f-bcb4-dc025b56b977 dati.salute.gov.it (Italy)- http://www.dati.salute.gov.it/dati/dettaglioDataset.jsp?menu=dati&idPag=96 portal.icuregswe.org (Sweden) - https://portal.icuregswe.org/seiva/en/Rapport publications: Intensive Care Medicine Journal (Europe) - https://link.springer.com/article/10.1007/s00134-012-2627-8 Critical Care Medicine Journal (Asia) - https://www.researchgate.net/figure/Number-of-critical-care-beds-per-100-000-population_fig1_338520008 Medicina Intensiva (Spain) - https://www.medintensiva.org/en-pdf-S2173572713000878 news: https://lanuovaferrara.gelocal.it/italia-mondo/cronaca/2020/03/19/news/dietro-la-corsa-a-nuovi-posti-in-terapia-intensiva-gli-errori-del-passato-1.38611596 kaggle: germany - https://www.kaggle.com/manuelblechschmidt/icu-beds-in-germany brazil (IBGE) - https://www.kaggle.com/thiagobodruk/brazilianstates Manual population data search from wiki

    Data columns

    country,state,county,lat,lng,type,measure,beds,population,year,source,source_url - country - country of origin, if present - state - more granular location, if present - lat - latitude - lng - longtitude - type - [TOTAL, ICU, ACUTE(some data could include ICU beds too), PSYCHIATRIC, OTHER(merged ā€˜SPECIAL’, ā€˜CHRONIC DISEASE’, ā€˜CHILDREN’, ā€˜LONG TERM CARE’, ā€˜REHABILITATION’, ā€˜WOMEN’, ā€˜MILITARY’] - measure - type of measure (per 1000 inhabitants) - beds - number of beds per 1000 - population - population of location based on multiple sources and wikipedia - year - source year for beds and population data - source - source of data - source_url - URL of the original source

    Files

    for each of datasource: hospital_beds_per_source.csv

    US only: US arcgis + khn (state/county granularity): hospital_beds_USA.csv

    Global (state(region)/county granularity): hospital_beds_global_regional.csv

    Global (country granularity): hospital_beds_global_v1.csv

    Contributors

    Igor Kiulian - extracting/normalizing/formatting/merging data Artur Kiulian - helped with Kaggle setup Augaly S. Kiedi - helped with country population data Kristoffer Jan Zieba - found Swedish data sources

    Possible Improvements

    Find and megre more detailed (state/county wise) or newer datasource

  10. Predictive Validity Data Set

    • figshare.com
    txt
    Updated Dec 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Abeyta (2022). Predictive Validity Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.17030021.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 18, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Antonio Abeyta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Verbal and Quantitative Reasoning GRE scores and percentiles were collected by querying the student database for the appropriate information. Any student records that were missing data such as GRE scores or grade point average were removed from the study before the data were analyzed. The GRE Scores of entering doctoral students from 2007-2012 were collected and analyzed. A total of 528 student records were reviewed. Ninety-six records were removed from the data because of a lack of GRE scores. Thirty-nine of these records belonged to MD/PhD applicants who were not required to take the GRE to be reviewed for admission. Fifty-seven more records were removed because they did not have an admissions committee score in the database. After 2011, the GRE’s scoring system was changed from a scale of 200-800 points per section to 130-170 points per section. As a result, 12 more records were removed because their scores were representative of the new scoring system and therefore were not able to be compared to the older scores based on raw score. After removal of these 96 records from our analyses, a total of 420 student records remained which included students that were currently enrolled, left the doctoral program without a degree, or left the doctoral program with an MS degree. To maintain consistency in the participants, we removed 100 additional records so that our analyses only considered students that had graduated with a doctoral degree. In addition, thirty-nine admissions scores were identified as outliers by statistical analysis software and removed for a final data set of 286 (see Outliers below). Outliers We used the automated ROUT method included in the PRISM software to test the data for the presence of outliers which could skew our data. The false discovery rate for outlier detection (Q) was set to 1%. After removing the 96 students without a GRE score, 432 students were reviewed for the presence of outliers. ROUT detected 39 outliers that were removed before statistical analysis was performed. Sample See detailed description in the Participants section. Linear regression analysis was used to examine potential trends between GRE scores, GRE percentiles, normalized admissions scores or GPA and outcomes between selected student groups. The D’Agostino & Pearson omnibus and Shapiro-Wilk normality tests were used to test for normality regarding outcomes in the sample. The Pearson correlation coefficient was calculated to determine the relationship between GRE scores, GRE percentiles, admissions scores or GPA (undergraduate and graduate) and time to degree. Candidacy exam results were divided into students who either passed or failed the exam. A Mann-Whitney test was then used to test for statistically significant differences between mean GRE scores, percentiles, and undergraduate GPA and candidacy exam results. Other variables were also observed such as gender, race, ethnicity, and citizenship status within the samples. Predictive Metrics. The input variables used in this study were GPA and scores and percentiles of applicants on both the Quantitative and Verbal Reasoning GRE sections. GRE scores and percentiles were examined to normalize variances that could occur between tests. Performance Metrics. The output variables used in the statistical analyses of each data set were either the amount of time it took for each student to earn their doctoral degree, or the student’s candidacy examination result.

  11. UniCourt Court Data API - USA Court Records (AI Normalized)

    • datarade.ai
    .json, .csv, .xls
    Updated Jul 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UniCourt (2022). UniCourt Court Data API - USA Court Records (AI Normalized) [Dataset]. https://datarade.ai/data-products/court-data-api-unicourt-2c86
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Jul 8, 2022
    Dataset provided by
    Unicourt
    Authors
    UniCourt
    Area covered
    United States
    Description

    UniCourt simplifies access to structured court records with our Court Data API, so you can search court cases via API, get real-time alerts with webhooks, streamline your account management, and get bulk access to the AI normalized court data you need.

    Search Court Cases with APIs

    • Leverage UniCourt’s easy API integrations to search state and federal (PACER) court records directly from your own internal applications and systems. • Access the docket entries and case details you need on the parties, attorneys, law firms, and judges involved in litigation. • Conduct the same detailed case searches you can in our app with our APIs and easily narrow your search results using our jurisdiction, case type, and case status filters. • Use our Related Cases API to search for and download all of the court data for consolidated cases from the Judicial Panel on Multidistrict Litigation, as well as associated civil and criminal cases from U.S. District Courts.

    Get Real-Time Alerts with Webhooks

    • UniCourt’s webhooks provide you with industry leading automation tools for real-time push notifications to your internal applications for all your case tracking needs. • Get daily court data feeds with new case results for your automated court searches pushed directly to your applications in a structured format. • Use our custom search file webhook to search for and track thousands of entities at once and receive your results packaged into a custom CSV file. • Avoid making multiple API calls to figure out if a case has updates or not and remove the need to continuously check the status of large document orders and updates.

    Bulk Access to Court Data

    • UniCourt downloads thousands of new cases everyday from state and federal courts, and we structure them, normalize them with our AI, and make them accessible in bulk via our Court Data API. • Our rapidly growing CrowdSourced Libraryā„¢ provides you with a massive free repository of 100+ million court cases, tens of millions of court documents, and billions of docket entries all at your fingertips. • Leverage your bulk access to AI normalized court data that’s been enriched with other public data sets to build your own analytics, competitive intelligence, and machine learning models.

    Streamlined Account Management

    • Easily manage your UniCourt account with information on your billing cycle and billing usage delivered to you via API. • Eliminate the requirement of logging in to your account to get a list of all of your invoices and use our APIs to directly download the invoices you need. • Get detailed data on which cases are being tracked by the users for your account and access all of the related tracking schedules for cases your users are tracking. • Gather complete information on the saved searches being run by your account, including the search parameters, filters, and much more.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CompCurve (2023). Full US Phone Number and Telecom Data | 387,543,864 Phones | Full USA Coverage | Mobile and Landline with Carrier | 100% Verifiable Data [Dataset]. https://datarade.ai/data-products/full-us-phone-number-and-telecom-data-387-543-864-phones-compcurve

Full US Phone Number and Telecom Data | 387,543,864 Phones | Full USA Coverage | Mobile and Landline with Carrier | 100% Verifiable Data

Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Aug 12, 2023
Dataset authored and provided by
CompCurve
Area covered
United States
Description

This comprehensive dataset delivers 387M+ U.S. phone numbers enriched with deep telecom intelligence and granular geographic metadata, providing one of the most complete national phone data assets available today. Designed for data enrichment, verification, identity resolution, analytics, risk modeling, telecom research, and large-scale customer intelligence, this file combines broad coverage with highly structured attributes and reliable carrier-grade metadata. It is a powerful resource for any organization that needs accurate, up-to-date U.S. phone number data supported by robust telecom identifiers.

Our dataset includes mobile, landline, and VOIP numbers, paired with detailed fields such as carrier, line type, city, state, ZIP code, county, latitude/longitude, time zone, rate center, LATA, and OCN. These attributes make the file suitable for a wide range of applications, from consumer analytics and segmentation to identity graph construction and marketing audience modeling. Updated regularly and validated for completeness, this dataset offers high-confidence coverage across all 50 states, major metros, rural areas, and underserved regions.

Field Coverage & Schema Overview

The dataset contains a rich set of fields commonly required for telecom analysis, identity resolution, and large-scale data cleansing:

Phone Number – Standardized 10-digit U.S. number

Line Type – Wireless, Landline, VOIP, fixed-wireless, etc.

Carrier / Provider – Underlying or current carrier assignment

City & State – Parsed from rate center and location metadata

ZIP Code – Primary ZIP associated with the phone block

County – County name mapped to geographic area

Latitude / Longitude – Approximate geo centroid for the assigned location

Time Zone – Automatically mapped; useful for outbound compliance

Rate Center – Telco rate center tied to number blocks

LATA – Local Access and Transport Area for telecom routing

OCN (Operating Company Number) – Carrier identifier for precision analytics

Additional metadata such as region codes, telecom identifiers, and national routing attributes depending on the number block

These data points provide a complete snapshot of the phone number’s telecom context and geographic footprint.

Key Features

387M+ fully structured U.S. phone numbers

Mobile, landline, and VOIP line types

Accurate carrier and OCN information

Geo-enriched records with city, state, ZIP, county, lat/long

Telecom routing metadata including rate center and LATA

Ideal for large-scale analytics, enrichment, and modeling

Nationwide coverage with consistent formatting and schema

Primary Use Cases 1. Data Enrichment & Appending

Enhance customer databases by adding carrier information, line type, geographic attributes, and telecom routing fields to improve downstream analytics and segmentation.

  1. Identity Resolution & Profile Matching

Use carrier, OCN, and geographic fields to strengthen your identity graph, resolve duplicate entities, confirm telephone types, or enrich cross-channel identifiers.

  1. Lead Scoring & Consumer Modeling

Build predictive models based on:

Line type (mobile vs landline)

Geography (state, county, ZIP)

Telecom infrastructure and regional carrier assignments Useful for ML/AI scoring, propensity models, risk analysis, and customer lifetime value studies.

  1. Compliance-Aware Outreach Planning

Fields like time zone, rate center, and line type support compliant outbound operations, call scheduling, and segmentation of mobile vs landline users for regulated environments.

  1. Data Quality, Cleansing & Validation

Normalize customer files, detect outdated or mismatched phone metadata, resolve carrier inconsistencies, and remove non-U.S. or structurally invalid numbers.

  1. Telecom Market Analysis

Researchers and telecom analysts can use the dataset to understand national carrier distribution, regional line-type patterns, infrastructure growth, and switching behavior.

  1. Fraud Detection & Risk Intelligence

Carrier metadata, OCN patterns, and geographic context support:

Synthetic identity detection

Fraud scoring models

Device/number reputation systems

VOIP risk modeling

  1. Location-Based Analytics & Mapping

Lat/long and geographic context fields allow integration into GIS systems, heat-mapping, regional modeling, and ZIP- or county-level segmentation.

  1. Customer Acquisition & Audience Building

Build highly targeted audiences for:

Marketing analytics

Look-alike modeling

Cross-channel segmentation

Regional consumer insights

  1. Enterprise-Scale ETL & Data Infrastructure

The structured, normalized schema makes this file easy to integrate into:

Data lakes

Snowflake / BigQuery warehouses

ID graphs

Customer 360 platforms

Telecom research systems

Ideal Users

Marketing analytics teams

Data science groups

Identity resolution providers

Fraud & risk intelligence platforms

Telecom analysts

Consumer data platforms

Credit, insurance, and fintech modeling teams

Data brokers & a...

Search
Clear search
Close search
Google apps
Main menu