Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.
Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.
CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.
Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.
Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.
Facebook
TwitterThis is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.
Database Diagram:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">
The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses
Facebook
TwitterA little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.
You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.
Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float
Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......
Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty
The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
NoSQL Database Market was valued at $9.38 Billion in 2023, and is projected to reach $USD 86.48 Billion by 2032, at a CAGR of 28% from 2023 to 2032.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Data Science Platform Market Size 2025-2029
The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.
Major Market Trends & Insights
North America dominated the market and accounted for a 48% growth during the forecast period.
By Deployment - On-premises segment was valued at USD 38.70 million in 2023
By Component - Platform segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 1.00 million
Market Future Opportunities: USD 763.90 million
CAGR : 40.2%
North America: Largest market in 2023
Market Summary
The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations.
According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.
What will be the Size of the Data Science Platform Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?
The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Deployment
On-premises
Cloud
Component
Platform
Services
End-user
BFSI
Retail and e-commerce
Manufacturing
Media and entertainment
Others
Sector
Large enterprises
SMEs
Application
Data Preparation
Data Visualization
Machine Learning
Predictive Analytics
Data Governance
Others
Geography
North America
US
Canada
Europe
France
Germany
UK
Middle East and Africa
UAE
APAC
China
India
Japan
South America
Brazil
Rest of World (ROW)
By Deployment Insights
The on-premises segment is estimated to witness significant growth during the forecast period.
In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.
Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.
API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.
Request Free Sample
The On-premises segment was valued at USD 38.70 million in 2019 and showed
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
HR analytics, also referred to as people analytics, workforce analytics, or talent analytics, involves gathering together, analyzing, and reporting HR data. It is the collection and application of talent data to improve critical talent and business outcomes. It enables your organization to measure the impact of a range of HR metrics on overall business performance and make decisions based on data. They are primarily responsible for interpreting and analyzing vast datasets.
Download the data CSV files here ; https://drive.google.com/drive/folders/18mQalCEyZypeV8TJeP3SME_R6qsCS2Og
Facebook
TwitterSpatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
Facebook
Twitteranalyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS’ data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS’ new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition, pvsR facilitates the replication of research with PVS data at low costs, including the pre-processing of data. Similar OSIs are recommended for other big public databases.
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Global In-memory database market is expected to revenue of around USD 36.21 billion by 2032, growing at a CAGR of 19.2% between 2024 and 2032.
Facebook
TwitterDatabase of the nation''s substance abuse and mental health research data providing public use data files, file documentation, and access to restricted-use data files to support a better understanding of this critical area of public health. The goal is to increase the use of the data to most accurately understand and assess substance abuse and mental health problems and the impact of related treatment systems. The data include the U.S. general and special populations, annual series, and designs that produce nationally representative estimates. Some of the data acquired and archived have never before been publicly distributed. Each collection includes survey instruments (when provided), a bibliography of related literature, and related Web site links. All data may be downloaded free of charge in SPSS, SAS, STATA, and ASCII formats and most studies are available for use with the online data analysis system. This system allows users to conduct analyses ranging from cross-tabulation to regression without downloading data or relying on other software. Another feature, Quick Tables, provides the ability to select variables from drop down menus to produce cross-tabulations and graphs that may be customized and cut and pasted into documents. Documentation files, such as codebooks and questionnaires, can be downloaded and viewed online.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
[Note: Integrated as part of FoodData Central, April 2019.]
USDA's Food and Nutrient Database for Dietary Studies (FNDDS) is a database that is used to convert food and beverages consumed in What We Eat In America (WWEIA), National Health and Nutrition Examination Survey (NHANES) into gram amounts and to determine their nutrient values. Because FNDDS is used to generate the nutrient intake data files for WWEIA, NHANES, it is not required to estimate nutrient intakes from the survey. FNDDS is made available for researchers using WWEIA, NHANES to review the nutrient profiles for specific foods and beverages as well as their associated portions and recipes. Such detailed information makes it possible for researchers to conduct enhanced analysis of dietary intakes. FNDDS can also be used in other dietary studies to code foods/beverages and amounts eaten and to calculate the amounts of nutrients/food components in those items.
FNDDS is released every two-years in conjunction with the WWEIA, NHANES dietary data release. The FNDDS is available for free download from the FSRG website.
Resources in this dataset:Resource Title: Website Pointer to Food and Nutrient Database for Dietary Studies. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/fndds/ USDA's Food and Nutrient Database for Dietary Studies (FNDDS) is a database that is used to convert food and beverages consumed in What We Eat In America (WWEIA), National Health and Nutrition Examination Survey (NHANES) into gram amounts and to determine their nutrient values.
Facebook
TwitterSuccess.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.
Key Features of Success.ai's Company Financial Data:
Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.
Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.
Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.
Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.
Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.
Why Choose Success.ai for Company Financial Data?
Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.
AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.
Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.
Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.
Comprehensive Use Cases for Financial Data:
Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.
Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.
Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.
Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.
Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.
APIs to Power Your Financial Strategies:
Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.
Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.
Tailored Solutions for Industry Professionals:
Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.
Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.
Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.
Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.
What Sets Success.ai Apart?
Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.
Ethical Practices: Our data collection and processing methods are fully comp...
Facebook
TwitterNowadays web portals play an essential role in searching and retrieving information in the several fields of knowledge: they are ever more technologically advanced and designed for supporting the storage of a huge amount of information in natural language originating from the queries launched by users worldwide. A good example is given by the WorldWideScience search engine: The database is available at http://worldwidescience.org/. It is based on a similar gateway, Science.gov, which is the major path to U.S. government science information, as it pulls together Web-based resources from various agencies. The information in the database is intended to be of high quality and authority, as well as the most current available from the participating countries in the Alliance, so users will find that the results will be more refined than those from a general search of Google. It covers the fields of medicine, agriculture, the environment, and energy, as well as basic sciences. Most of the information may be obtained free of charge (the database itself may be used free of charge) and is considered ‘‘open domain.’’ As of this writing, there are about 60 countries participating in WorldWideScience.org, providing access to 50+databases and information portals. Not all content is in English. (Bronson, 2009) Given this scenario, we focused on building a corpus constituted by the query logs registered by the GreyGuide: Repository and Portal to Good Practices and Resources in Grey Literature and received by the WorldWideScience.org (The Global Science Gateway) portal: the aim is to retrieve information related to social media which as of today represent a considerable source of data more and more widely used for research ends. This project includes eight months of query logs registered between July 2017 and February 2018 for a total of 445,827 queries. The analysis mainly concentrates on the semantics of the queries received from the portal clients: it is a process of information retrieval from a rich digital catalogue whose language is dynamic, is evolving and follows – as well as reflects – the cultural changes of our modern society.
Facebook
TwitterAs of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the Institute of Cybernetics, Mathematics, and Physics in the Republic of Cuba the course "Databases and digital Library" is a discipline in the Master's degree program of Applied Cybernetics. An essential part of the course is the creation of documental databases starting from information retrieval from the Internet. To equip the laboratories required for better learning, the most suitable tools for information retrieval are needed, both from an educational point of view as well as the easiness for their acquisition. Therefore, the characteristics to evaluate these tools and the methodology for selecting them were defined. As a result, of the thirteen recovery tools and data analysis from free softwares available to be downloaded, the following eight tools were selected: Lemur Toolkit with Indri, Sphinx, WebSphinx with Rapid Miner, Solr / Lucene / Hadoop / Mahout, Terrier and Dragon, which guaranteed the quality of the course and the connection with other courses in the Master's degree program.
Facebook
TwitterThe MarketScan Commercial Database (previously called the 'MarketScan Database') contains real-world data for healthcare research and analytics to examine health economics and treatment outcomes.
This page also contains the MarketScan Commercial Lab Database starting in 2018.
Starting in 2026, there will be a data access fee for using the full dataset. Please refer to the 'Usage Notes' section of this page for more information.
MarketScan Research Databases are a family of data sets that fully integrate many types of data for healthcare research, including:
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
The MarketScan Databases track millions of patients throughout the healthcare system. The data are contributed by large employers, managed care organizations, hospitals, EMR providers, and Medicare.
This page contains the MarketScan Commercial Database.
We also have the following on other pages:
%3C!-- --%3E
**Starting in 2026, there will be a data access fee for using the full dataset **(though the 1% sample will remain free to use). The pricing structure and other **relevant information can be found in this **FAQ Sheet.
All manuscripts (and other items you'd like to publish) must be submitted to support@stanfordphs.freshdesk.com for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Data access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
Soil spectroscopy has emerged as a solution to the limitations associated with traditional soil surveying and analysis methods, addressing the challenges of time and financial resources. Analyzing the soil's spectral reflectance enables to observe the soil composition and simultaneously evaluate several attributes because the matter, when exposed to electromagnetic energy, leaves a "spectral signature" that makes such evaluations possible. The Soil Spectral Library (SSL) consolidates soil spectral patterns from a specific location, facilitating accurate modeling and reducing time, cost, chemical products, and waste in surveying and mapping processes. Therefore, an open access SSL benefits society by providing a fine collection of free data for multiple applications for both research and commercial use.
BSSL Description and Usefulness
The Brazilian Soil Spectral Library (BSSL), available at https://bibliotecaespectral.wixsite.com/english, is a comprehensive repository of soil spectral data. Coordinated by JAM Demattê and managed by the GeoCiS research group, the BSSL was initiated in 1995 and published by Demattê and collaborators in 2019. This initiative stands out due to its coverage of diverse soil types, given Brazil's significance in the agricultural and environmental domains and its status as the fifth largest territory in the world (IBGE, 2023). In addition, a Middle Infrared (MIR) dataset has been published (Mendes et al., 2022), part of which is included in this repository. The database covers 16,084 sites and includes harmonized physicochemical and spectral (Vis-NIR-SWIR and MIR range) soil data from various sources at 0-20 cm depth. All soil samples have Vis-NIR-SWIR data, but not all have MIR data.
The BSSL provides open and free access to curated data for the scientific community and interested individuals. Unrestricted access to the BSSL supports researchers in validating their results by comparing measured data with predicted values. This initiative also facilitates the development of new models and the improvement of existing ones. Moreover, users can employ the library to test new models and extract information about previously unknown soil properties. With its extensive coverage of tropical soil classes, the BSSL is considered one of the most significant soil spectral libraries worldwide, with 42 institutions and 61 researchers participating. However, 47 collaborators from 29 institutions have authorized the data opening. Other researchers can also provide their data upon request through the coordinator of this initiative.
The data from the BSSL project can also help wet labs to improve their analytical capabilities, contributing to developing hybrid wet soil laboratory techniques and digital soil maps while informing decision-makers in formulating conservation and land use policies. The soil's capacity for different land uses promotes soil health and sustainability.
Coverage
The BSSL data covers all regions of Brazil, including 26 states and the Federal District. It is in a .xlsx format and has a total size of 305 Mb. The table is structured in sheets with rows for observations, and columns, representing various soil attributes in the surface layer, from 0 to 20 cm depth. The database includes environmental and physicochemical properties (20 columns and 16,084 rows), Vis-NIR-SWIR spectral bands (2151 columns and 16,084 rows), and MIR channels (681 columns and 1783 rows). An ID unique column can merge the sheet for each attribute or spectral range.
Accessing original data source
Using these data requires their reference in any situation under copyright infringement penalty. Three mechanisms are available for users to reach the original and complete data contributors:
a) Refer to sheet two for name and code-based searches;
b) Visit the website https://bibliotecaespectral.wixsite.com/english/lista-de-cedentes or locate the contributors' list by Brazilian state;
c) Visit the website of the Brazilian Soil Spectral Service – Braspecs http://www.besbbr.com.br/, an online platform for soil analysis that uses part of the current SSL (Demattê et al., 2022) - It was developed and managed by GeoCiS. There, owners from all over the country can be found.
Proceeding to data analysis
We registered and organized the samples at the ESALQ/USP Soil Laboratory. Some samples arrived without preliminary data analyses, so we analyzed them for soil organic matter (SOM), granulometry, cation exchange capacity (CEC), pH in water, and the presence of Ca, Mg, and Na, following the recommendations of Donagemma et al. (2011).
The GeoCiS research group performed spectral analyses following the procedures described by Bellinaso et al. (2010). Demattê et al. (2019) provide detailed methods for sampling, preparation, and soil analyses, including reflectance spectroscopy. Latitude and longitude data can be requested directly from the data owner. In summary, the following steps are involved in data acquisition.
a) We subjected the soil samples to a preliminary treatment, which involved drying them in an oven at 45°C for 48 hours, grinding them, and sieving them through a 2mm mesh;
b) We placed the samples in Petri dishes with a diameter of 9 cm and a height of 1.5 cm;
c) We homogenized and flattened the surface of the samples to reduce the shading caused by larger particles or foreign bodies, making them ready for spectral readings;
d) The spectral analyses took place in a darkened room to avoid interference from natural light. We used a computer to record the electromagnetic pulses through an optical fiber connected to the sensor, capturing the spectral response of the soil sample;
e) We obtained reflectance data in the Visible-Near Infrared-Shortwave Infrared (Vis-NIR-SWIR) range using a FieldSpec 3 spectroradiometer (Analytical Spectral Devices, ASD, Boulder, CO), which operates in the spectral range from 350 to 2500 nm;
f) The sensor had a spectral resolution of 3 nm from 350-700 nm and 10 nm from 700-2500 nm, automatically interpolated to 1 nm spectral resolution in the output data, resulting in 2151 channels (or bands); and
g) We positioned the lamps at 90° from each other and 35 cm away from the sample, with a zenith angle of 30°.
The sensor captured the light reflected through the fiber optic cable, which was positioned 8 cm from the sample's surface.
We used two 50W halogen lamps as the power source for the artificial light. It's important to note that we took three readings for each sample at different positions by rotating the Petri dish by 90°.
Each reading represents the average of 100 scans taken by the sensor. From these three readings, we calculated the final spectrum of the samples. Notably, the laboratory's equipment and procedures for soil sample spectral analyses followed the ASD's recommendations, particularly about sensor calibration using a white spectralon plate as a 100% reflectance standard.
For the analysis in the Middle Infrared (MIR) spectral region, we followed the procedures outlined by Mendes et al. (2022). We milled the soil fraction smaller than 2 mm, sieved it to 0.149 mm, and scanned it using a Fourier Transform Infrared (FT-IR) alpha spectroradiometer (Bruker Optics Corporation, Billerica, MA 01821, USA) equipped with a DRIFT accessory.
The spectroradiometer measured the diffuse reflectance using Fourier transformation in the spectral range from 4000 cm-1 to 600 cm-1, with a resolution of 2 cm-1. We conducted these measurements in the Geotechnology Laboratory of the Department of Soil Science at Esalq-USP. We took the average of 32 successive readings to obtain a soil spectrum. Sensor calibration took place before each spectral acquisition of the sample set by standardizing it against the maximum reflectance of a gold plate.
Dataset characterization
The database, named BSSL_DB_Key_Soils, has five sheets containing the key soil attributes, Vis-NIR-SWIR and MIR datasets, descriptions of the contributors and the proximal sensing methods used for spectral soil analysis. The sheets can be linked by "ID_Unique" columns, which bring the corresponding rows according to the data type. Some cells are empty because collaborators have already provided data in this way. However, we have decided to keep them in the database because they have other soil key attributes. Every Column in the data sheets is described as follows:
Sheet 1. BSSL_Soil_Attributes_Dataset
Column 1. ID_unique: Sequential code assigned to every record;
Column 2. Owner code: Acronym assigned to each contributor who allowed access to their proprietary data;
Column 3. Vis_NIR_SWIR_availability: availability of spectral data in visible, near-infrared, and shortwave infrared ranges;
Column 4. MIR_availability: availability of spectral data in the middle infrared range;
Column 5. Sampling: type of soil sampling;
Column 6. Depth_cm: soil surface layer depth in centimeters;
Column 7. Region: Brazilian geographical region of samples' source;
Column 8. Municipality: Brazilian municipality of samples' source;
Column 9. State: Brazilian Federation Unit of samples'
Facebook
Twitter➡️ You can choose from multiple data formats, delivery frequency options, and delivery methods;
➡️ You can select raw or clean and AI-enriched datasets;
➡️ Multiple APIs designed for effortless search and enrichment (accessible using a user-friendly self-service tool);
➡️ Fresh data: daily updates, easy change tracking with dedicated data fields, and a constant flow of new data;
➡️ You get all necessary resources for evaluating our data: a free consultation, a data sample, or free credits for testing our APIs.
Coresignal's employee data enables you to create and improve innovative data-driven solutions and extract actionable business insights. These datasets are popular among companies from different industries, including HR and sales technology and investment.
Employee Data use cases:
✅ Source best-fit talent for your recruitment needs
Coresignal's Employee Data can help source the best-fit talent for your recruitment needs by providing the most up-to-date information on qualified candidates globally.
✅ Fuel your lead generation pipeline
Enhance lead generation with 712M+ up-to-date employee records from the largest professional network. Our Employee Data can help you develop a qualified list of potential clients and enrich your own database.
✅ Analyze talent for investment opportunities
Employee Data can help you generate actionable signals and identify new investment opportunities earlier than competitors or perform deeper analysis of companies you're interested in.
➡️ Why 400+ data-powered businesses choose Coresignal:
Facebook
TwitterThe NIST Electron Inelastic-Mean-Free-Path Database provides values of electron inelastic mean free paths (IMFPs) principally for use in surface analysis by Auger-electron spectroscopy and X-ray photoelectron spectroscopy. The database includes IMFPs calculated from experimental optical data and IMFPs measured by elastic-peak electron spectroscopy. If no calculated or measured IMFPs are available for a material of interest, values can be estimated from the predictive IMFP formulae of Tanuma et al. and of Gries. IMFPs are available for electron energies between 50 eV and 10,000 eV although most of the available data are for energies less than 2,000 eV. A critical review of calculated and measured IMFPs has been published [C. J. Powell and A. Jablonski, J. Phys. Chem. Ref. Data 28, 19 (1999)].
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.
Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.
CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.
Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.
Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.