100+ datasets found

Types of unique data points collection in selected iOS fitness apps 2024
statista.com
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Types of unique data points collection in selected iOS fitness apps 2024 [Dataset]. https://www.statista.com/statistics/1559485/collection-and-tracking-ios-fitness-apps/
Explore at:
Dataset updated
Feb 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 30, 2024
Area covered
Worldwide
Description
In 2024, the fitness app Strava had the largest number of collected data points that were linked to their app users. Out of the total 21 collected data types, 20 were linked to the users' identity, while two data points could potentially be used to track users. The Nike Training Club app was examined to collect four data points that potentially could help track users. Fitbit, Future Personal Training, and Fitness by Apple did not present any data point that could potentially track users.
d
Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...
datarade.ai
.json, .csv, .xls
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Altosight (2024). Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data Points | Bypassing All CAPTCHAs & Blocking Mechanisms | GDPR Compliant [Dataset]. https://datarade.ai/data-products/altosight-ai-custom-web-scraping-data-100-global-free-altosight
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Sep 7, 2024
Dataset authored and provided by
Altosight
Area covered
Chile, Svalbard and Jan Mayen, Tajikistan, Guatemala, Singapore, Côte d'Ivoire, Greenland, Wallis and Futuna, Paraguay, Czech Republic
Description
Altosight | AI Custom Web Scraping Data

✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.

We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.

✦ Our solution offers free unlimited data points across any project, with no additional setup costs.

We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.

― Key Use Cases ―

➤ Price Monitoring & Repricing Solutions

🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals

➤ E-commerce Optimization

🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data

➤ Product Assortment Analysis

🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup

➤ Marketplaces & Aggregators

🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis

➤ Business Website Data

🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis

🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies

➤ Domain Name Data

🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts

➤ Real Estate Data

🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies

― Data Collection & Quality ―

► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators

► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction

► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more

► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence

► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project

► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction

― Why Choose Altosight? ―

✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges

✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are

✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs

✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations

✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment

✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems

✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day

― Custom Projects & Real-Time Data ―

✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals

✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...
Types of unique data points collection in selected iOS weight loss apps 2025...
statista.com
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Types of unique data points collection in selected iOS weight loss apps 2025 [Dataset]. https://www.statista.com/statistics/1559523/collection-and-tracking-ios-nutrition-apps/
Explore at:
Dataset updated
Feb 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 8, 2025
Area covered
Worldwide
Description
In 2024, the Calorie Counter app had the largest number of collected data points possibly linked to the user identity. Out of the total 22 collected data types, 20 were linked to the users' identity, while seven data points could potentially be used to track users. Calorie counting app Eato did not display any of the collected data types that could potentially be used to track users. The iOS mobile app for the Weight Watchers Program collected seven different data points that were not linked to users.
Heidelberg Tributary Loading Program (HTLP) Dataset
zenodo.org
explore.openaire.eu
+1more
bin, png
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCWQR; NCWQR (2024). Heidelberg Tributary Loading Program (HTLP) Dataset [Dataset]. http://doi.org/10.5281/zenodo.6606950
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6606950
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
NCWQR; NCWQR
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is updated more frequently and can be visualized on NCWQR's data portal.

If you have any questions, please contact Dr. Laura Johnson or Dr. Nathan Manning.

The National Center for Water Quality Research (NCWQR) is a research laboratory at Heidelberg University in Tiffin, Ohio, USA. Our primary research program is the Heidelberg Tributary Loading Program (HTLP), where we currently monitor water quality at 22 river locations throughout Ohio and Michigan, effectively covering ~half of the land area of Ohio. The goal of the program is to accurately measure the total amounts (loads) of pollutants exported from watersheds by rivers and streams. Thus these data are used to assess different sources (nonpoint vs point), forms, and timing of pollutant export from watersheds. The HTLP officially began with high-frequency monitoring for sediment and nutrients from the Sandusky and Maumee rivers in 1974, and has continually expanded since then.

Each station where samples are collected for water quality is paired with a US Geological Survey gage for quantifying discharge (http://waterdata.usgs.gov/usa/nwis/rt). Our stations cover a wide range of watershed areas upstream of the sampling point from 11.0 km2 for the unnamed tributary to Lost Creek to 19,215 km2 for the Muskingum River. These rivers also drain a variety of land uses, though a majority of the stations drain over 50% row-crop agriculture.

At most sampling stations, submersible pumps located on the stream bottom continuously pump water into sampling wells inside heated buildings where automatic samplers collect discrete samples (4 unrefrigerated samples/d at 6-h intervals, 1974–1987; 3 refrigerated samples/d at 8-h intervals, 1988-current). At weekly intervals the samples are returned to the NCWQR laboratories for analysis. When samples either have high turbidity from suspended solids or are collected during high flow conditions, all samples for each day are analyzed. As stream flows and/or turbidity decreases, analysis frequency shifts to one sample per day. At the River Raisin and Muskingum River, a cooperator collects a grab sample from a bridge at or near the USGS station approximately daily and all samples are analyzed. Each sample bottle contains sufficient volume to support analyses of total phosphorus (TP), dissolved reactive phosphorus (DRP), suspended solids (SS), total Kjeldahl nitrogen (TKN), ammonium-N (NH4), nitrate-N and nitrite-N (NO2+3), chloride, fluoride, and sulfate. Nitrate and nitrite are commonly added together when presented; henceforth we refer to the sum as nitrate.

Upon return to the laboratory, all water samples are analyzed within 72h for the nutrients listed below using standard EPA methods. For dissolved nutrients, samples are filtered through a 0.45 um membrane filter prior to analysis. We currently use a Seal AutoAnalyzer 3 for DRP, silica, NH4, TP, and TKN colorimetry, and a DIONEX Ion Chromatograph with AG18 and AS18 columns for anions. Prior to 2014, we used a Seal TRAACs for all colorimetry.

2017 Ohio EPA Project Study Plan and Quality Assurance Plan

Project Study Plan

Quality Assurance Plan

Data quality control and data screening

The data provided in the River Data files have all been screened by NCWQR staff. The purpose of the screening is to remove outliers that staff deem likely to reflect sampling or analytical errors rather than outliers that reflect the real variability in stream chemistry. Often, in the screening process, the causes of the outlier values can be determined and appropriate corrective actions taken. These may involve correction of sample concentrations or deletion of those data points.

This micro-site contains data for approximately 126,000 water samples collected beginning in 1974. We cannot guarantee that each data point is free from sampling bias/error, analytical errors, or transcription errors. However, since its beginnings, the NCWQR has operated a substantial internal quality control program and has participated in numerous external quality control reviews and sample exchange programs. These programs have consistently demonstrated that data produced by the NCWQR is of high quality.

A note on detection limits and zero and negative concentrations

It is routine practice in analytical chemistry to determine method detection limits and/or limits of quantitation, below which analytical results are considered less reliable or unreliable. This is something that we also do as part of our standard procedures. Many laboratories, especially those associated with agencies such as the U.S. EPA, do not report individual values that are less than the detection limit, even if the analytical equipment returns such values. This is in part because as individual measurements they may not be considered valid under litigation.

The measured concentration consists of the true but unknown concentration plus random instrument error, which is usually small compared to the range of expected environmental values. In a sample for which the true concentration is very small, perhaps even essentially zero, it is possible to obtain an analytical result of 0 or even a small negative concentration. Results of this sort are often “censored” and replaced with the statement “
Censoring these low values creates a number of problems for data analysis. How do you take an average? If you leave out these numbers, you get a biased result because you did not toss out any other (higher) values. Even if you replace negative concentrations with 0, a bias ensues, because you’ve chopped off some portion of the lower end of the distribution of random instrument error.

For these reasons, we do not censor our data. Values of -9 and -1 are used as missing value codes, but all other negative and zero concentrations are actual, valid results. Negative concentrations make no physical sense, but they make analytical and statistical sense. Users should be aware of this, and if necessary make their own decisions about how to use these values. Particularly if log transformations are to be used, some decision on the part of the user will be required.

Analyte Detection Limits

https://ncwqr.files.wordpress.com/2021/12/mdl-june-2019-epa-methods.jpg?w=1024

For more information, please visit https://ncwqr.org/
[Superseded] Intellectual Property Government Open Data 2019
researchdata.edu.au
data.gov.au
Updated Jun 6, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IP Australia (2019). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://researchdata.edu.au/superseded-intellectual-property-data-2019/2994670
Explore at:
Dataset updated
Jun 6, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
IP Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is IPGOD?\r

The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r

How do I use IPGOD?\r

IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r

IP Data Platform\r

IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r

References\r

\r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r

Updates\r

\r

Tables and columns\r

\r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r

Data quality improvements\r

\r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
Delaware River and Upper Bay Sediment Data
catalog.data.gov
fisheries.noaa.gov
Updated Oct 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA Office for Coastal Management (Point of Contact, Custodian) (2024). Delaware River and Upper Bay Sediment Data [Dataset]. https://catalog.data.gov/dataset/delaware-river-and-upper-bay-sediment-data1
Explore at:
Dataset updated
Oct 31, 2024
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Area covered
Delaware River
Description
The area of coverage consists of 192 square miles of benthic habitat mapped from 2005 to 2007 in the Delaware River and Upper Delaware Bay. The bottom sediment map was constructed by the utilization of a Roxann Seabed Classification System and extensive sediment grab samples. Data was collected in a gridded trackline configuration, with tracklines spacing of 100 meters parallel to the shoreline and 200 meters perpendicular to the shoreline.This project is an extension of the work currently being performed in Delaware waters by DNREC's Delaware Coastal Program's Delaware Bay Benthic Mapping Project.The bottom sediment point data, which has been classified according to the existing benthic mapping Roxann box plot, are converted from a number that categorizes the point according to its corresponding box (in the Roxann) into a number which reflects the sediment properties of each box in relation to one another. A ranking scale is used to allow a statistical griding scheme to interpolate between sediment data points, while minimizing erroneous sediment classifications and allowing gradational sediment deposits to be gridded. A ranking scale from 0 to 28 was used for this project, with 0 representing the finest grained classifications (fluidized clay) and 28 representing the coarsest grained classifications (dense shell material). Table 1 illustrates the distribution of sediment classifications along the ranking scale, which takes into account the relation of sediment types and grain sizes to one another using both the Wentworth Scale and Shepard's classification system. Finer grains are more similar in their deposition environments, such as clay and silts, because they reflect similar current regimes, sorting, and reworking patterns (Poppe et al., 2003). While coarse sediments are much more dissimilar to finer grains, with respect to current velocities, sorting, and winnowing, the finer grains are much more closely related in their sediment diameters that the coarser grains as you increase in Phi size and/or diameter. These account for the close clustering of coarse grained deposit descriptions at the upper end of the ranking scale, while the finer grained sediments show a gradation as you increase in the rating scale.The bottom sediment data is gridded in Surfer 8, a surface and terrain modeling program, using block kriging and a nugget effect. This statistical griding technique estimates the average value of a variable within a prescribed local area (Isaaks and Srivastava, 1989). Block kriging utilizes the existing point data values, weights the values of the data depending upon the proximity to the point being estimated, to discretize the local area into an array of estimated data value points and then averaging those individual point estimates together to get an average estimated value over the area of interest (Isaaks and Srivastava, 1989). A variogram is constructed for the data, and the resultant spatial model that is developed from the variogram is used in the block kriging surface model to more accurately interpolate the sediment data . The fitted model was a nugget effect (with an error variance of 21.8%) and a linear model (with a slope of 0.00286 and an anisotropy of 1, which represents a complete lack of spatial correlation).The accuracy of the estimation is dependent upon the grid size of the area of interpolation, the size of each cell within the grid, and the number of discretized data points that are necessary to estimate the cells within that grid spacing. The grid size that was used to interpolate the bottom sediment maps was 442 lines x 454 lines, with a cell size of 44.93 m2. The nugget effect is added to allow the griding to assume there is very little, if any, lateral correlation or trends within the bottom sediment (Isaaks and Srivastava, 1989). The nugget effect model entails a complete lack of spatial correlation; the point data values at any particular location bear no similarity even to adjacent data values (Isaaks and Srivastava, 1989). Without the nugget effect the griding would assume that you could only have a linear progression of sediment types and would insert all the sediment types along the scale between two sediment types (i.e. silty fine to medium sands and fine to medium sand with varying amounts of pebbles would be inserted between fine sand and coarse sand even though that is not what is occurring along the bottom. The sediment data is gridded with no drift for the data interpolation, also helping to minimize erroneous classifications. Sediment Classification Ranking Sediment Description 0-11-2 Clay, 2-33-44-55-66-7 Silt, 7-88-9 Sandy Silts, 9-1010-11 Fine Sand, 11-1212-13 Silty Fine to Medium Sands, 13-14 Silty Medium Sand, 14-1515-16 Fine to Medium Sand, 16-1717-18 Fine to Medium Sand with abundant shell material and/or pebbles, 18-1919-20 Coarse Sand with varying amounts of pebbles, 20-2121-2222-23 Moderate Shell Material/Sandy Pebbles, 23-2424-2525-26 Abundant Shell Material/Gravel, 26-2727-28 Dense Oyster Shell Original contact information: Contact Name: Bartholomew Wilson Contact Org: Delaware DNREC Coastal Programs Phone: 302-739-9283
Success.ai | LinkedIn Data | 700M Public Profiles & 70M Companies – Best...
datarade.ai
Updated Jan 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2022). Success.ai | LinkedIn Data | 700M Public Profiles & 70M Companies – Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-linkedin-data-700m-public-profiles-70m-compa-success-ai-294c
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 1, 2022
Dataset provided by
Area covered
Austria, Luxembourg, Singapore, Montserrat, Greenland, Mauritius, Saudi Arabia, Estonia, Virgin Islands (British), Mayotte
Description
Success.ai’s LinkedIn Data Solutions offer unparalleled access to a vast dataset of 700 million public LinkedIn profiles and 70 million LinkedIn company records, making it one of the most comprehensive and reliable LinkedIn datasets available on the market today. Our employee data and LinkedIn data are ideal for businesses looking to streamline recruitment efforts, build highly targeted lead lists, or develop personalized B2B marketing campaigns.

Whether you’re looking for recruiting data, conducting investment research, or seeking to enrich your CRM systems with accurate and up-to-date LinkedIn profile data, Success.ai provides everything you need with pinpoint precision. By tapping into LinkedIn company data, you’ll have access to over 40 critical data points per profile, including education, professional history, and skills.

Key Benefits of Success.ai’s LinkedIn Data: Our LinkedIn data solution offers more than just a dataset. With GDPR-compliant data, AI-enhanced accuracy, and a price match guarantee, Success.ai ensures you receive the highest-quality data at the best price in the market. Our datasets are delivered in Parquet format for easy integration into your systems, and with millions of profiles updated daily, you can trust that you’re always working with fresh, relevant data.

Global Reach and Industry Coverage: Our LinkedIn data covers professionals across all industries and sectors, providing you with detailed insights into businesses around the world. Our geographic coverage spans 259M profiles in the United States, 22M in the United Kingdom, 27M in India, and thousands of profiles in regions such as Europe, Latin America, and Asia Pacific. With LinkedIn company data, you can access profiles of top companies from the United States (6M+), United Kingdom (2M+), and beyond, helping you scale your outreach globally.

Why Choose Success.ai’s LinkedIn Data: Success.ai stands out for its tailored approach and white-glove service, making it easy for businesses to receive exactly the data they need without managing complex data platforms. Our dedicated Success Managers will curate and deliver your dataset based on your specific requirements, so you can focus on what matters most—reaching the right audience. Whether you’re sourcing employee data, LinkedIn profile data, or recruiting data, our service ensures a seamless experience with 99% data accuracy.

Best Price Guarantee: We offer unbeatable pricing on LinkedIn data, and we’ll match any competitor.

Global Scale: Access 700 million LinkedIn profiles and 70 million company records globally.

AI-Verified Accuracy: Enjoy 99% data accuracy through our advanced AI and manual validation processes.

Real-Time Data: Profiles are updated daily, ensuring you always have the most relevant insights.

Tailored Solutions: Get custom-curated LinkedIn data delivered directly, without managing platforms.

Ethically Sourced Data: Compliant with global privacy laws, ensuring responsible data usage.

Comprehensive Profiles: Over 40 data points per profile, including job titles, skills, and company details.

Wide Industry Coverage: Covering sectors from tech to finance across regions like the US, UK, Europe, and Asia.

Key Use Cases:

Sales Prospecting and Lead Generation: Build targeted lead lists using LinkedIn company data and professional profiles, helping sales teams engage decision-makers at high-value accounts.

Recruitment and Talent Sourcing: Use LinkedIn profile data to identify and reach top candidates globally. Our employee data includes work history, skills, and education, providing all the details you need for successful recruitment.

Account-Based Marketing (ABM): Use our LinkedIn company data to tailor marketing campaigns to key accounts, making your outreach efforts more personalized and effective.

Investment Research & Due Diligence: Identify companies with strong growth potential using LinkedIn company data. Access key data points such as funding history, employee count, and company trends to fuel investment decisions.

Competitor Analysis: Stay ahead of your competition by tracking hiring trends, employee movement, and company growth through LinkedIn data. Use these insights to adjust your market strategy and improve your competitive positioning.

CRM Data Enrichment: Enhance your CRM systems with real-time updates from Success.ai’s LinkedIn data, ensuring that your sales and marketing teams are always working with accurate and up-to-date information.

Comprehensive Data Points for LinkedIn Profiles: Our LinkedIn profile data includes over 40 key data points for every individual and company, ensuring a complete understanding of each contact:

LinkedIn URL: Access direct links to LinkedIn profiles for immediate insights. Full Name: Verified first and last names. Job Title: Current job titles, and prior experience. Company Information: Company name, LinkedIn URL, domain, and location. Work and Per...
d
2017 Countywide LiDAR Point Cloud
catalog.data.gov
datasets.ai
+2more
Updated Sep 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lake County Illinois GIS (2022). 2017 Countywide LiDAR Point Cloud [Dataset]. https://catalog.data.gov/dataset/2017-countywide-lidar-point-cloud-638f8
Explore at:
Dataset updated
Sep 1, 2022
Dataset provided by
Lake County Illinois GIS
Description
Click here to access the data directly from the Illinois State Geospatial Data Clearinghouse. These lidar data are processed Classified LAS 1.4 files, formatted to 2,117 individual 2500 ft x 2500 ft tiles; used to create Reflectance Images, 3D breaklines and hydro-flattened DEMs as necessary. Geographic Extent: Lake county, Illinois covering approximately 466 square miles. Dataset Description: WI Kenosha-Racine Counties and IL 4 County QL1 Lidar project called for the Planning, Acquisition, processing and derivative products of lidar data to be collected at a derived nominal pulse spacing (NPS) of 1 point every 0.35 meters. Project specifications are based on the U.S. Geological Survey National Geospatial Program Base Lidar Specification, Version 1.2. The data was developed based on a horizontal projection/datum of NAD83 (2011), State Plane, U.S Survey Feet and vertical datum of NAVD88 (GEOID12B), U.S. Survey Feet. Lidar data was delivered as processed Classified LAS 1.4 files, formatted to 2,117 individual 2500 ft x 2500 ft tiles, as tiled Reflectance Imagery, and as tiled bare earth DEMs; all tiled to the same 2500 ft x 2500 ft schema. Ground Conditions: Lidar was collected April-May 2017, while no snow was on the ground and rivers were at or below normal levels. In order to post process the lidar data to meet task order specifications and meet ASPRS vertical accuracy guidelines, Ayers established a total of 66 ground control points that were used to calibrate the lidar to known ground locations established throughout the WI Kenosha-Racine Counties and IL 4 County QL1 project area. An additional 195 independent accuracy checkpoints, 116 in Bare Earth and Urban landcovers (116 NVA points), 79 in Tall Grass and Brushland/Low Trees categories (79 VVA points), were used to assess the vertical accuracy of the data. These checkpoints were not used to calibrate or post process the data. Users should be aware that temporal changes may have occurred since this dataset was collected and that some parts of these data may no longer represent actual surface conditions. Users should not use these data for critical applications without a full awareness of its limitations. Acknowledgement of the U.S. Geological Survey would be appreciated for products derived from these data. These LAS data files include all data points collected. No points have been removed or excluded. A visual qualitative assessment was performed to ensure data completeness. No void areas or missing data exist. The raw point cloud is of good quality and data passes Non-Vegetated Vertical Accuracy specifications.Link Source: Illinois Geospatial Data Clearinghouse
United States COVID-19 Community Levels by County
data.cdc.gov
data.virginia.gov
+1more
application/rdfxml +5
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC COVID-19 Response (2023). United States COVID-19 Community Levels by County [Dataset]. https://data.cdc.gov/Public-Health-Surveillance/United-States-COVID-19-Community-Levels-by-County/3nnm-4jni
Explore at:
application/rdfxml, application/rssxml, csv, tsv, xml, jsonAvailable download formats
Dataset updated
Nov 2, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC COVID-19 Response
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.

The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.

Using these data, the COVID-19 community level was classified as low, medium, or high.

COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.

For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

Archived Data Notes:

This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.

March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.

March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.

March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.

March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.

March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).

March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.

April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.

April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.

May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.

May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.

May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.

June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.

June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.

June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.

July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.

July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.

July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.

July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.

July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.

August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.

August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.

August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.

August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.

August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.

August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.

August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.

September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.

September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
Data from: A two-dimensional interpolation function for irregularly-spaced...
hosted-metadata.bgs.ac.uk
Updated Jan 1, 1968
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (1968). A two-dimensional interpolation function for irregularly-spaced data [Dataset]. https://hosted-metadata.bgs.ac.uk/geonetwork/srv/api/records/f56eb439-5bd9-4111-990b-820a94ad9092?language=all
Explore at:
Dataset updated
Jan 1, 1968
Dataset provided by
British Geological Surveyhttps://www.bgs.ac.uk/
Harvard University
Description
This is a peer reviewed publication looking at methods of data interpretation relevant to geochemical datasets. in many fields using empirical areal data there arises a need for interpolating from irregularly-spaced data to produce a continuous surface. These irregularly spaced locations, hence referred to as "data points," may have diverse meanings: in meterology, weather observation stations; in geography, surveyed locations; in city and regional planning, centers of data-collection zones; in biology, observation locations. It is assumed that a unique number (such as rainfall in meteorology, or altitude in geography) is associated with each data point. In order to display these data in some type of contour map or perspective view, to compare them with data for the same region based on other data points, or to analyze them for extremes, gradients, or other purposes, it is extremely useful, if not essential, to define a continuous function fitting the given values exactly. Interpolated values over a fine grid may then be evaluated. In using such a function it is assumed that the original data are without error, or that compensation for error will be made after interpolation. In essence, an operational solution to the problem of two-dimensional interpolation from irregularly-spaced data points is desired. It is assumed that a finite number N of triplets (xi, Yi, gi) are given, where xi, Yi are the locational coordinates of the data point D;, and zi is the corresponding data value. Data point locations may not be coincident. An interpolation function z=f(x,y) to assign a value to any location P(x,y) in the plane is sought. This two-dimensional interpolation function is to be "smooth" (continuous and once differentiable), to pass through the specified points, (i.e., f(xi,Yi)=Zi), and to meet the user's intuitive expectations about the phenomenon under investigation. Furthermore, the function should be suitable for computer application at reasonable cost

Website:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.6880&rep=rep1&type=pdf
Data from: Medicare Spending per Beneficiary
kaggle.com
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Medicare Spending per Beneficiary [Dataset]. https://www.kaggle.com/datasets/thedevastator/medicare-spending-per-beneficiary
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
Medicare Spending per Beneficiary

Detailed Hospital Expense Breakdown

By Health [source]

About this dataset

This file allows healthcare executives and analysts to make informed decisions regarding how well continued improvements are being made over time so that they can understand how efficient they are fulfilling treatments while staying within budgetary constraints. Additionally, it’ll also help them map out trends amongst different hospitals and spot anomalies that could indicate areas where decisions should be reassessed as needed

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can provide valuable insights into how Medicare is spending per patient at specific hospitals in the United States. It can be used to gain a better understanding of the types of services covered under Medicare, and to what extent those services are being used. By comparing the average Medicare spending across different hospitals, users can also gain insight into potential disparities in care delivery or availability.

To use this dataset, first identify which hospital you are interested in analyzing. Then locate the row for that hospital in the dataset and review its associated values: value, footnote (optional), and start/end dates (optional). The Value column refers to how much Medicare spends on each particular patient; this is a numerical value represented as a decimal number up to 6 decimal places. The Footnote (optional) provides more information about any special circumstances that may need attention when interpreting the value data points. Finally, if Start Date and End Date fields are present they will specify over what timeframe these values were aggregated over.

Once all relevant data elements have been reviewed successively for all hospitals of interest then comparison analysis among them can be conducted based on Value, Footnote or Start/End dates as necessary to answer specific research questions or formulate conclusions about how Medicare is spending per patient at various hospitals nationwide

Research Ideas

Developing a cost comparison tool for hospitals that allows patients to compare how much Medicare spends per patient across different hospitals.

Creating an algorithm to help predict Medicare spending at different facilities over time and build strategies on how best to manage those costs.

Identifying areas in which a hospital can save money by reducing unnecessary spending in order to reduce overall Medicare expenses

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: Medicare_hospital_spending_per_patient_Medicare_Spending_per_Beneficiary_Additional_Decimal_Places.csv | Column name | Description | |:---------------|:--------------------------------------------------------------------------------------| | Value | The amount of Medicare spending per patient for a given hospital or region. (Numeric) | | Footnote | Any additional notes or information related to the value. (Text) | | Start_Date | The start date of the period for which the value applies. (Date) | | End_Date | The end date of the period for which the value applies. (Date) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Health.
H
Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021)
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meghan Robinson; W. Adam Sigler; Mike Koopal (2023). Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021) [Dataset]. http://doi.org/10.4211/hs.5ca7307fda8949299e6782885da95046
Explore at:
zip(219.0 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.5ca7307fda8949299e6782885da95046
Dataset updated
Feb 28, 2023
Dataset provided by
HydroShare
Authors
Meghan Robinson; W. Adam Sigler; Mike Koopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 27, 2007 - Nov 3, 2021
Area covered

Description
This resource contains data collected by the Whitefish Lake Institute (WLI) as well as R code used to compile and conduct quality assurance on the data. This resource reflects joint publication efforts between WLI and the Montana State University Extension Water Quality (MSUEWQ) program. All data included here was uploaded to the National Water Quality Portal (WQX) in 2022. It is the intention of WLI to upload all future data to WQX and this HydroShare resource may also be updated in the future with data for 2022 and forward.

Data Purpose: The ‘Data’ folder of this resource holds the final data products for the extensive dataset collected by WLI between 2007 and 2021. This folder is likely of interest to users who want data for research and analysis purposes. This dataset contains physical water parameter field data collected by Hydrolab MS5 and DS5 loggers, including water temperature, specific conductance, dissolved oxygen concentration and saturation, barometric pressure, and turbidity. Additional field data that needs further quality assurance prior to use includes chlorophyll a, ORP, pH, and PAR. This dataset also contains water chemistry data analyzed at certified laboratories including total nitrogen, total phosphorus, nitrate, orthophosphate, total suspended solids, organic carbon, and chlorophyll a. The data folder includes R scripts with code for examples of data visualization. This dataset can provide insight to water quality trends in lakes and streams of northwestern Montana over time. Data Summary: During the time-period, WLI collected water quality data for 63 lake sites and 17 stream and river sites in northwestern Montana under two separate monitoring projects. The Northwest Montana Lakes Network (NMLN) project currently visits 41 lake sites in Northwestern Montana once per summer. Field data from Hydrolabs are collected at discrete depths throughout a lake's profile, and depth integrated water chemistry samples are collected as well. The Whitefish Water Quality Monitoring Project (WWQMP) currently visits two sites on Whitefish Lake, one site on Tally Lake, and 11 stream and river sites in the Whitefish Lake and Upper Whitefish River watersheds monthly between April and November. Field data is collected at one depth for streams and many depths throughout the lake profiles, and water chemistry samples are collected at discrete depths for Whitefish Lake and streams. The final dataset for both programs includes over 112,000 datapoints of data passing quality assurance assessment and an additional 72,000 datapoints that would need further quality assurance before use.

Workflow Purpose: The ‘Workflow’ folder of this resource contains the raw data, folder structure, and R code used during this data compilation and upload process. This folder is likely of interest to users who have similar datasets and are interested in code for automating data compilation or upload processes. The R scripts included here have code to stitch together many individual Hydrolab MS5 and DS5 logger files as well as lab electronic data deliverables (EDDs), which may be useful for users who are interested in compiling one or multiple seasons' worth of data into a single file. Reformatting scripts format data to match the multi-sheet excel workbook format required by the Montana Department of Environmental Quality for uploads to WQX, and may be useful to others hoping to automate database uploads. Workflow Summary: Compilation code in the workflow folder compiles data from its most original forms, including Hydrolab sonde export files and lab EDDs. This compilation process includes extracting dates and times from comment fields and producing a single file from many input files. Formatting code then reformats the data to match WQX upload requirements, which includes generating unique activity IDs for data collected at the same site, date, and time then linking these activity IDs with results across worksheets in an excel workbook. Code for generating all quality assurance figures used in the decision-making process outlined in the Quality Assurance Document and resulting data removal decisions are included here as well. Finally, this folder includes code for combining data from the separate program uploads for WQX to the more user-friendly structure for analysis provided in the 'Data' file for this HydroShare resource.
US Crime Dataset
brightdata.com
.json, .csv, .xlsx
Updated May 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). US Crime Dataset [Dataset]. https://brightdata.com/products/datasets/crime/us
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
May 21, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide, United States
Description
We will build you a custom US crime dataset based on your needs. Data points may include date, time, location, crime type, crime description, victim demographics, offender demographics, arrest records, charges filed, court outcomes, police department response time, incident outcome, weapon used, property stolen or damaged, crime location type, and other related data.

Use our US crime datasets for a range of applications to enhance public safety and policy effectiveness. Analyzing these datasets can help organizations understand crime patterns and trends across different regions of the United States, enabling them to tailor their strategies and interventions accordingly. Depending on your needs, you may access the entire dataset or a customized subset.

Popular use cases include: improving public safety measures, designing targeted crime prevention programs, resource allocation for law enforcement, and more.
Z
Modern China Geospatial Database - Main Dataset
data.niaid.nih.gov
zenodo.org
Updated Feb 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Henriot (2025). Modern China Geospatial Database - Main Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5735393
Explore at:
Dataset updated
Feb 28, 2025
Dataset authored and provided by
Christian Henriot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
MCGD_Data_V2.2 contains all the data that we have collected on locations in modern China, plus a number of locations outside of China that we encounter frequently in historical sources on China. All further updates will appear under the name "MCGD_Data" with a time stamp (e.g., MCGD_Data2023-06-21)

You can also have access to this dataset and all the datasets that the ENP-China makes available on GitLab: https://gitlab.com/enpchina/IndexesEnp

Altogether there are 464,970 entries. The data include the name of locations and their variants in Chinese, pinyin, and any recorded transliteration; the name of the province in Chinese and in pinyin; Province ID; the latitude and longitude; the Name ID and Location ID, and NameID_Legacy. The Name IDs all start with H followed by seven digits. This is the internal ID system of MCGD (the NameID_Legacy column records the Name IDs in their original format depending on the source). Locations IDs that start with "DH" are data points extracted from China Historical GIS (Harvard University); those that start with "D" are locations extracted from the data points in Geonames; those that have only digits (8 digits) are data points we have added from various map sources.

One of the main features of the MCGD Main Dataset is the systematic collection and compilation of place names from non-Chinese language historical sources. Locations were designated in transliteration systems that are hardly comprehensible today, which makes it very difficult to find the actual locations they correspond to. This dataset allows for the conversion from these obsolete transliterations to the current names and geocoordinates.

From June 2021 onward, we have adopted a different file naming system to keep track of versions. From MCGD_Data_V1 we have moved to MCGD_Data_V2. In June 2022, we introduced time stamps, which result in the following naming convention: MCGD_Data_YYYY.MM.DD.

UPDATES

MCGD_Data2025_02_28 includes a major change with the duplication of all the locations listed under Beijing, Shanghai, Tianjin, and Chongqing (北京, 上海, 天津, 重慶) and their listing under the name of the provinces to which they belonge origially before the creation of the four special municipalities after 1949. This is meant to facilitate the matching of data from historical sources. Each location has a unique NameID. Altogether there are 472,818 entries

MCGD_Data2025_02_27 inclues an update on locations extracted from Minguo zhengfu ge yuanhui keyuan yishang zhiyuanlu 國民政府各院部會科員以上職員錄 (Directory of staff members and above in the ministries and committees of the National Government). Nanjing: Guomin zhengfu wenguanchu yinzhuju 國民政府文官處印鑄局國民政府文官處印鑄局, 1944). We also made corrections in the Prov_Py and Prov_Zh columns as there were some misalignments between the pinyin name and the name in Chines characters. The file now includes 465,128 entries.

MCGD_Data2024_03_23 includes an update on locations in Taiwan from the Asia Directories. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown").

MCGD_Data2023.12.22 contains all the data that we have collected on locations in China, whatever the period. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown"). The dataset also includes locations outside of China for the purpose of matching such locations to the place names extracted from historical sources. For example, one may need to locate individuals born outside of China. Rather than maintaining two separate files, we made the decision to incorporate all the place names found in historical sources in the gazetteer. Such place names can easily be removed by selecting all the entries where the 'Province' data is missing.
Human Resource Data Set (The Company)
kaggle.com
Updated Jan 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koluit (2025). Human Resource Data Set (The Company) [Dataset]. https://www.kaggle.com/datasets/koluit/human-resource-data-set-the-company/versions/940
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Koluit
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.

Some example test cases where someone might use this dataset:

HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools

Content

The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.

Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.

Acknowledgements

Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.

Inspiration

Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.

Contact Us

Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com
LinkedIn Datasets
brightdata.com
.json, .csv, .xlsx
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
d
Data from: 2019 Distribution System Upgrade Unit Cost Database Current...
catalog.data.gov
data.openei.org
+2more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). 2019 Distribution System Upgrade Unit Cost Database Current Version [Dataset]. https://catalog.data.gov/dataset/2019-distribution-system-upgrade-unit-cost-database-current-version-24139
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
IMPORTANT NOTE: This is the current version of NREL's Distribution System Unit cost Database and should be considered the most up-to-date. Compared to the previous version (https//data.nrel.gov/submissions/77) this database has additional data points and has been modified for improved usability. More information on the changes that have been made can be found in the attached file Unit_cost_database_guide_v2.docx. This guide also has important information about data sources and quality as well as intended use of the database. Please consult this database guide before using this data for any purpose. This database contains unit cost information for different components that may be used to integrate distributed photovotaic DPV systems onto distribution systems. Some of these upgrades and costs may also apply to integration of other distributed energy resources DER. Which components are required and how many of each is system-specific and should be determined by analyzing the effects of distributed PV at a given penetration level on the circuit of interest in combination with engineering assessments on the efficacy of different solutions to increase the ability of the circuit to host additional PV as desired. The current state of the distribution system should always be considered in these types of analysis. The data in this database was collected from a variety of utilities PV developers technology vendors and published research reports. Where possible we have included information on the source of each data point and relevant notes. In some cases where data provided is sensitive or proprietary we were not able to specify the source but provide other information that may be useful to the user e.g. year location where equipment was installed. NREL has carefully reviewed these sources prior to inclusion in this database. - Originated 01/02/2019 by National Renewable Energy Laboratory
h
UGround-V1-Data-Box
huggingface.co
Updated May 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OSU NLP Group (2025). UGround-V1-Data-Box [Dataset]. https://huggingface.co/datasets/osunlp/UGround-V1-Data-Box
Explore at:
Dataset updated
May 4, 2025
Dataset authored and provided by
OSU NLP Group
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Updates

[May 1, 2025] Bounding Box Data: We have added bounding box version of Web-Hybrid. For everyone's convenience, no conversation template is applied to this version of data. All the coordinates (x1, y1, x2, y2) are as always normalized to [0,999]. The data has also been filtered (757k datapoints after content moderation).

Notes for Requests

If you have applied for access to this dataset but have not received approval, please contact us via email (Boyu Gou)… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/UGround-V1-Data-Box.
West Oakland Lead Sampling (Scribe) Data Points, West Oakland CA, 2018, U.S....
catalog.data.gov
datasets.ai
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Region 9 (Publisher) (2025). West Oakland Lead Sampling (Scribe) Data Points, West Oakland CA, 2018, U.S. EPA Region 9 [Dataset]. https://catalog.data.gov/dataset/west-oakland-lead-sampling-scribe-data-points-west-oakland-ca-2018-u-s-epa-region-913
Explore at:
Dataset updated
Feb 25, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
California, Oakland, West Oakland, United States
Description
This feature class contains 436 points depicting lead sampling locations across West Oakland, California taken during the 2018 Urban Metals Study. The U.S. Environmental Protection Agency (EPA) and the California Department of Toxic Substances Control (DTSC) have partnered on a project in West Oakland to study lead in soil. Lead is a heavy metal that is often found in urban soil. It usually comes from sources such as chipped paint on pre-1978 housing, historic pollution from leaded gas, or lead recycling (smelting). Children exposed to lead can have health problems, including impaired brain and physical development. In June 2018, EPA and DTSC took soil samples in city-owned property along streets, not private property, at nearly 200 randomly selected locations across West Oakland. This area was selected because of its mix of possible sources of lead, including industry, older homes that may be painted with leaded paint and nearby freeways.
H
Introduction to Time Series Analysis for Hydrologic Data
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Jan 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Garcia; Kateri Salk (2021). Introduction to Time Series Analysis for Hydrologic Data [Dataset]. https://beta.hydroshare.org/resource/ee2a4c2151f24115a12e34d4d22d96fe/
Explore at:
zip(1.1 MB)Available download formats
Dataset updated
Jan 29, 2021
Dataset provided by
HydroShare
Authors
Gabriela Garcia; Kateri Salk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 1974 - Jan 27, 2021
Area covered
Description
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.

Introduction

Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme and lme4 R packages). We will be covering a few simple applications of time series analysis in these lessons.

Opportunities

Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:

Has there been an increasing or decreasing trend in the response variable over time?

Can we forecast conditions in the future?

Challenges

Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:

Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).

Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.

Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.

Heteroscedasticity: The variance of the time series is not constant over time.

Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.

Learning Objectives

After successfully completing this notebook, you will be able to:

Choose appropriate time series analyses for trend detection and forecasting

Discuss the influence of seasonality on time series analysis

Interpret and communicate results of time series analyses

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Types of unique data points collection in selected iOS fitness apps 2024 [Dataset]. https://www.statista.com/statistics/1559485/collection-and-tracking-ios-fitness-apps/

Types of unique data points collection in selected iOS fitness apps 2024

Explore at:

Dataset updated

Feb 25, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Dec 30, 2024

Area covered

Worldwide

Description

In 2024, the fitness app Strava had the largest number of collected data points that were linked to their app users. Out of the total 21 collected data types, 20 were linked to the users' identity, while two data points could potentially be used to track users. The Nike Training Club app was examined to collect four data points that potentially could help track users. Fitbit, Future Personal Training, and Fitness by Apple did not present any data point that could potentially track users.

Clear search

Close search

Google apps

Main menu

Types of unique data points collection in selected iOS fitness apps 2024

Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...

Types of unique data points collection in selected iOS weight loss apps 2025...

Heidelberg Tributary Loading Program (HTLP) Dataset

[Superseded] Intellectual Property Government Open Data 2019

What is IPGOD?\r

How do I use IPGOD?\r

IP Data Platform\r

References\r

Updates\r

Tables and columns\r

Data quality improvements\r

Delaware River and Upper Bay Sediment Data

Success.ai | LinkedIn Data | 700M Public Profiles & 70M Companies – Best...

2017 Countywide LiDAR Point Cloud

United States COVID-19 Community Levels by County

Data from: A two-dimensional interpolation function for irregularly-spaced...

Data from: Medicare Spending per Beneficiary

Medicare Spending per Beneficiary

Detailed Hospital Expense Breakdown

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021)

US Crime Dataset

Modern China Geospatial Database - Main Dataset

Human Resource Data Set (The Company)

Context

Content

Acknowledgements

Inspiration

Contact Us

LinkedIn Datasets

Data from: 2019 Distribution System Upgrade Unit Cost Database Current...

UGround-V1-Data-Box

West Oakland Lead Sampling (Scribe) Data Points, West Oakland CA, 2018, U.S....

Introduction to Time Series Analysis for Hydrologic Data

Types of unique data points collection in selected iOS fitness apps 2024