Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
With data on school locations, categories, and contact information, analysts can explore various aspects of public school distribution, accessibility, and resource allocation. The geographical data allows for mapping and spatial analysis, which can help identify areas with higher concentrations of schools or regions that may lack adequate public education facilities. This dataset's uniform structure makes it suitable for integration with other demographic or socioeconomic datasets, enabling more nuanced analysis of educational accessibility and equity. Several analyses can be performed using this dataset: - Descriptive Statistics: To provide a summary of the dataset, including the number of schools by category, average number of schools per ZIP code, and other basic statistics. - Cluster Analysis: To group schools based on similar characteristics such as location, school type (high, middle, elementary), and size to identify patterns in school distribution. - Accessibility Analysis: To evaluate the ease of access to public schools for students in different areas, considering factors such as distance to schools and availability of public transportation. - Demographic and Socioeconomic Impact Analysis: To understand how demographic and socioeconomic factors influence the distribution and accessibility of public schools.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains customer satisfaction scores collected from a survey, alongside key demographic and behavioral data. It includes variables such as customer age, gender, location, purchase history, support contact status, loyalty level, and satisfaction factors. The dataset is designed to help analyze customer satisfaction, identify trends, and develop insights that can drive business decisions.
File Information: File Name: customer_satisfaction_data.csv (or your specific file name)
File Type: CSV (or the actual file format you are using)
Number of Rows: 120
Number of Columns: 10
Column Names:
Customer_ID – Unique identifier for each customer (e.g., 81-237-4704)
Group – The group to which the customer belongs (A or B)
Satisfaction_Score – Customer's satisfaction score on a scale of 1-10
Age – Age of the customer
Gender – Gender of the customer (Male, Female)
Location – Customer's location (e.g., Phoenix.AZ, Los Angeles.CA)
Purchase_History – Whether the customer has made a purchase (Yes or No)
Support_Contacted – Whether the customer has contacted support (Yes or No)
Loyalty_Level – Customer's loyalty level (Low, Medium, High)
Satisfaction_Factor – Primary factor contributing to customer satisfaction (e.g., Price, Product Quality)
Statistical Analyses:
Descriptive Statistics:
Calculate mean, median, mode, standard deviation, and range for key numerical variables (e.g., Satisfaction Score, Age).
Summarize categorical variables (e.g., Gender, Loyalty Level, Purchase History) with frequency distributions and percentages.
Two-Sample t-Test (Independent t-test):
Compare the mean satisfaction scores between two independent groups (e.g., Group A vs. Group B) to determine if there is a significant difference in their average satisfaction scores.
Paired t-Test:
If there are two related measurements (e.g., satisfaction scores before and after a certain event), you can compare the means using a paired t-test.
One-Way ANOVA (Analysis of Variance):
Test if there are significant differences in mean satisfaction scores across more than two groups (e.g., comparing the mean satisfaction score across different Loyalty Levels).
Chi-Square Test for Independence:
Examine the relationship between two categorical variables (e.g., Gender vs. Purchase History or Loyalty Level vs. Support Contacted) to determine if there’s a significant association.
Mann-Whitney U Test:
For non-normally distributed data, use this test to compare satisfaction scores between two independent groups (e.g., Group A vs. Group B) to see if their distributions differ significantly.
Kruskal-Wallis Test:
Similar to ANOVA, but used for non-normally distributed data. This test can compare the median satisfaction scores across multiple groups (e.g., comparing satisfaction scores across Loyalty Levels or Satisfaction Factors).
Spearman’s Rank Correlation:
Test for a monotonic relationship between two ordinal or continuous variables (e.g., Age vs. Satisfaction Score or Satisfaction Score vs. Loyalty Level).
Regression Analysis:
Linear Regression: Model the relationship between a continuous dependent variable (e.g., Satisfaction Score) and independent variables (e.g., Age, Gender, Loyalty Level).
Logistic Regression: If analyzing binary outcomes (e.g., Purchase History or Support Contacted), you could model the probability of an outcome based on predictors.
Factor Analysis:
To identify underlying patterns or groups in customer behavior or satisfaction factors, you can apply Factor Analysis to reduce the dimensionality of the dataset and group similar variables.
Cluster Analysis:
Use K-Means Clustering or Hierarchical Clustering to group customers based on similarity in their satisfaction scores and other features (e.g., Loyalty Level, Purchase History).
Confidence Intervals:
Calculate confidence intervals for the mean of satisfaction scores or any other metric to estimate the range in which the true population mean might lie.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Veteran Status for the U.S., States, Metro Areas, Counties, and Places: 2023.Table ID.ABSNESD2023.AB00MYNESD01D.Survey/Program.Economic Surveys.Year.2023.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2023 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-11-20.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2024 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2024 ABS collection year produces statistics for the 2023 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Veteran Status (defined as having served in any branch of the U.S. Armed Forces) Veteran Equally veteran/nonveteran Nonveteran Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2023 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business inco...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics for the demographic variables of the full survey sample.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundBangladesh is one of the most densely populated countries in the world, with more than one-third of its people living in cities, and its air quality is among the worst in the world. The present study aimed to measure knowledge, attitudes and practice (KAP) towards air pollution and health effects among the general population living in the large cities in Bangladesh.MethodsA cross-sectional e-survey was conducted between May and July 2022 among eight divisions in Bangladesh. A convenience sampling technique was utilized to recruit a total of 1,603 participants (55.58% males; mean age: 23.84 ± 5.93 years). A semi-structured questionnaire including informed consent, socio-demographic information, as well as questions regarding knowledge (11-item), attitudes (7-item) and practice (11-item) towards air pollution, was used to conduct the survey. All analyses (descriptive statistics and regression analyses) were performed using STATA (Version 15.0) and SPSS (Version 26.0).ResultsThe mean scores of the knowledge, attitudes, and practice were 8.51 ± 2.01 (out of 11), 19.24 ± 1.56 (out of 21), and 12.65 ±5.93 (out of 22), respectively. The higher scores of knowledge, attitudes, and practice were significantly associated with several socio-demographic factors, including educational qualification, family type, residential division, cooking fuel type, etc.ConclusionsThe present study found a fair level of knowledge and attitudes towards air pollution; however, the level of practice is not particularly noteworthy. The finding suggests the need to create more awareness among the general population to increase healthy practice to reduce the health effects of air pollution.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset provides aggregated counts of victims and suspects involved in crimes that fall under San Francisco’s mandated crime reporting categories, as recorded by the San Francisco Police Department (SFPD). The data is sourced from Crime Data Warehouse (CDW), which has been in operation since January 1, 2013.
Because CDW was implemented on that date, data prior to 2013 is incomplete or unavailable. To protect the privacy and safety of vulnerable individuals, the dataset is aggregated and does not contain any personally identifiable information or individual case records. Crime categories are organized using:
San Francisco’s 96A.5 “Quarterly Crime Victim Data Reporting”, legislated for victim demographic reporting (Definitions of crime types can be found in Chapter 96A.1)
FBI Uniform Crime Reporting (UCR) system (Definitions can be found on the SFPD website.)
This dataset also powers the public crime dashboards on the SFPD website, where users can explore summary statistics.
B. HOW THE DATASET IS CREATED Data is added to open data once a quarter after extraction, transformation, and aggregation.
Disclaimer: The San Francisco Police Department does not guarantee the accuracy, completeness, timeliness or correct sequencing of the information as the data is subject to change as modifications and updates are completed.
C. UPDATE PROCESS Information is updated on a quarterly basis.
D. HOW TO USE THIS DATASET This dataset provides aggregated counts of individuals involved in reported crimes, categorized by key demographics and crime-related attributes. It is used to power public-facing dashboards on the San Francisco Police Department (SFPD) website, where summary statistics and visualizations allow users to explore crime and victimization trends across the city. While the SFPD public dashboard provides many useful summaries and visualizations, not all data details are displayed there. For deeper or custom analysis, the full dataset can be downloaded for personal use.
Facebook
TwitterThe Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Background Regarding to oral health, little has been advanced on how to improve quality within dental care. Objective The aim of this study was to identify the demographic factors affecting the satisfaction of users of the dental public service having the value of a strategic and high consistency methodology. Method The Data Mining was used to a secondary database, contemplating 91 features, segmental in 9 demographic factors, 17 facets, and 5 dominions. Descriptive statistics were extracted to a demographic data and the satisfaction of the users by facets and dominions, being discovered as from Decision Trees and Association Rules. Results the analysis of the results showed the relation between the demographic factor 'professional occupation' and satisfaction, in all of the dominions. The occupations of general assistant and home assistant with daily wage stood out in Association Rules to represent the lower level of satisfaction compared to the facets that were worse evaluated. Also, the factor ‘health unit's name’ showed relation with most of the investigated dominions. The difference between health units was even more evident through the Association Rule. Conclusion The Data Mining allowed to identify complementary relations to the user's perception about the public oral health services quality, constituting a safe tool to support the management of Brazilian public health and the basis of future plans.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics (mean score and standard deviation; the percentage for ‘male’) of the demographic characteristics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic traits descriptive statistics.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Race for the U.S., States, Metro Areas, Counties, and Places: 2023.Table ID.ABSNESD2023.AB00MYNESD01C.Survey/Program.Economic Surveys.Year.2023.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2023 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-11-20.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2024 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2024 ABS collection year produces statistics for the 2023 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Race White Black or African American American Indian and Alaska Native Asian Native Hawaiian and Other Pacific Islander Minority (Firms classified as any race and ethnicity combination other than non-Hispanic and White) Equally minority/nonminority Nonminority (Firms classified as non-Hispanic and White) Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2023 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sa...
Facebook
TwitterDemographic Structure
This dataset falls under the category Traffic Generating Parameters Population.
It contains the following data: Descriptive statistics on sex and age of the population of the City of Buenos Aires.
This dataset was scouted on 2022-02-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://data.buenosaires.gob.ar/dataset/estructura-demografica
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset explores various factors associated with the reception of COVID-19 related content on TikTok. It not only captures overall levels of user engagement such as likes, comments, and views but also explores source credibility including information from healthcare professionals, news sources, patients, and other outlets. It further dives into demographic factors such as gender and age range as well as content type like humor or provision of clinical instruction. Finally, it takes a look at elements such as description of risk factors & symptoms along with modes of transmission established by the posts in question and prevention that was discussed within them. Moreover, there is a discernment component that breaks down user perception - rating the posts for level of misinformation (moderate/high/low). All these measures combined provide insights into how users are engaging with COVID-19 related misinformation on TikTok
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains user engagement data and measures of source credibility related to COVID-19 misinformation on TikTok. It can be used to examine the factors associated with content reception, such as views, likes, comments, as well as factors relating to credibility, demographics and content type.
Using this dataset: - Explore the columns available in the dataset. There are a number of columns that measure user engagement (views, likes and comments) as well as source credibility (official source, healthcare professional etc.), demographic factors (gender, age group etc.), and content type (humor etc). Get familiar with all these columns so that you know what information is available for analysis.
- Decide what kind of analysis you want to perform. You can use this data for exploratory or explanatory work - depending on your aims or research question. For example if you want to see how source credibility affects user engagement then you would need descriptive statistical techniques such as correlation tests or regression analyses etc., whereas if you just want to gain an overall understanding of patterns in this data then exploratory techniques such as cross tabulations may be more suitable.
- Developing a predictive model to identify which demographic and source characteristics are correlated with high user engagement for COVID-related posts on TikTok (e.g. views, likes, and comments).
- Investigating the difference in user engagement for posts from healthcare professionals vs non-professional sources to compare how different types of content are received by users on TikTok.
- Analyzing the sentiment of words related to masks and tests in order to gain insights into how content about this topic is perceived by users on TikTok (i.e., positive or negative sentiment)
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: tiktok_data_open.csv | Column name | Description | |:-------------------------------|:------------------------------------------------------------------------| | views | Number of views for the video. (Integer) | | likes | Number of likes for the video. (Integer) | | comments | Number of comments for the video. (Integer) | | official_source | Whether the source of the video is an official source. (Boolean) | | pub_hcp | Whether the source of the video is a healthcare professional. (Boolean) | | pub_news | Whether the source of the video is a news source. (Boolean) | | pub_patient | Whether the source of the video is a patient. (Boolean) | | pub_other | Whether the source of the video is another source. (Boolean) | | female ...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Ethnicity for the U.S., States, Metro Areas, Counties, and Places: 2022.Table ID.ABSNESD2022.AB00MYNESD01B.Survey/Program.Economic Surveys.Year.2022.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2022 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-05-08.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2023 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2023 ABS collection year produces statistics for the 2022 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Ethnicity Hispanic Equally Hispanic/non-Hispanic Non-Hispanic Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2022 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Race for the U.S., States, Metro Areas, Counties, and Places: 2022.Table ID.ABSNESD2022.AB00MYNESD01C.Survey/Program.Economic Surveys.Year.2022.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2022 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-05-08.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2023 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2023 ABS collection year produces statistics for the 2022 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Race White Black or African American American Indian and Alaska Native Asian Native Hawaiian and Other Pacific Islander Minority (Firms classified as any race and ethnicity combination other than non-Hispanic and White) Equally minority/nonminority Nonminority (Firms classified as non-Hispanic and White) Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2022 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sa...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Ethnicity for the U.S., States, Metro Areas, Counties, and Places: 2023.Table ID.ABSNESD2023.AB00MYNESD01B.Survey/Program.Economic Surveys.Year.2023.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2023 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-11-20.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2024 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2024 ABS collection year produces statistics for the 2023 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Ethnicity Hispanic Equally Hispanic/non-Hispanic Non-Hispanic Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2023 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry, Sex, Ethnicity, Race, and Veteran Status for the U.S., States, Metro Areas, Counties, and Places: 2022.Table ID.ABSNESD2022.AB2200NESD01.Survey/Program.Economic Surveys.Year.2022.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2022 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-05-08.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2023 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2023 ABS collection year produces statistics for the 2022 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by sex, ethnicity, race, and veteran status when classifiable.Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2022 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the Census Bureau. The NES-D adds demographic characteristics to the NES data and produces the total firm counts and the total receipts by those demographic characteristics. The NES-D utilizes various admini...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
U.S. Population Grids (Summary File 1), 2000: New Orleans Metropolitan Statistical Area, Alpha Version contains an ARC/INFO Workspace with grids of demographic data from the 2000 census. The grids have a resolution of 7.5 arc-seconds (0.002075 decimal degrees), or approximately 250 square meters. The gridded variables are based on census block geography from Census 2000 TIGER/Line Files and census variables (population, households, and housing variables) from Summary File 1. This data set is produced by the Columbia University Center for International Earth Science Information Network (CIESIN). To provide gridded demographic data, including characteristics of age, race, ethnicity, and housing, for metropolitan statistical areas at a finer resolution than is available in the 30 arc-second grids used for the United States as a whole.
Facebook
TwitterThis data collection contains detailed county and state-level ecological and descriptive data for the United States for the years 1790 to 2002. Parts 1-43 are an update to HISTORICAL, DEMOGRAPHIC, ECONOMIC, AND SOCIAL DATA: THE UNITED STATES, 1790-1970 (ICPSR 0003). Parts 1-41 contain data from the 1790-1970 censuses. They include extensive information about the social and political character of the United States, including a breakdown of population by state, race, nationality, number of families, size of the family, births, deaths, marriages, occupation, religion, and general economic condition. Parts 42 and 43 contain data from the 1840 and 1870 Censuses of Manufacturing, respectively. These files include information about the number of persons employed in various industries and the quantities of different types of manufactured products. Parts 44-50 provide county-level data from the United States Census of Agriculture for 1840 to 1900. They also include the state and national totals for the variables. The files provide data about the number, types, and prices of various agricultural products. Parts 51-57 contain data on religious bodies and church membership for 1906, 1916, 1926, 1936, and 1952, respectively. Parts 58-69 consist of data from the CITY DATA BOOKS for 1944, 1948, 1952, 1956, 1962, 1967, 1972, 1977, 1983, 1988, 1994, and 2000, respectively. These files contain information about population, climate, housing units, hotels, birth and death rates, school enrollment and education expenditures, employment in various industries, and city government finances. Parts 70-81 consist of data from the COUNTY DATA BOOKS for 1947, 1949, 1952, 1956, 1962, 1967, 1972, 1977, 1983, 1988, 1994, and 2000, respectively. These files include information about population, employment, housing, agriculture, manufacturing, retail, services, trade, banking, Social Security, local governments, school enrollment, hospitals, crime, and income. Parts 82-84 contain data from USA COUNTIES 1998. Due to the large number of variables from this source, the data were divided into into three separate data files. Data include information on population, vital statistics, school enrollment, educational attainment, Social Security, labor force, personal income, poverty, housing, trade, farms, ancestry, commercial banks, and transfer payments. Parts 85-106 provide data from the United States Census of Agriculture for 1910 to 2002. They provide data about the amount, types, and prices of various agricultural products. Also, these datasets contain extensive information on the amount, expenses, sales, values, and production of farms and machinery. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR02896.v3. We highly recommend using the ICPSR version, as they made this dataset available in multiple data formats and updated the data through 2002.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.