53 datasets found
  1. Higher Education Institutions in the USA

    • kaggle.com
    zip
    Updated Apr 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jackson Júnior (2023). Higher Education Institutions in the USA [Dataset]. https://www.kaggle.com/datasets/jacksonbarreto/higher-education-institutions-in-the-usa/data
    Explore at:
    zip(35907 bytes)Available download formats
    Dataset updated
    Apr 8, 2023
    Authors
    Jackson Júnior
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Higher Education Institutions in the United States of America Dataset

    This repository contains a dataset of higher education institutions in the United States of America. This dataset was compiled in response to a cybersecurity research of American higher education institutions' websites [1]. The data is being made publicly available to promote open science principles [2].

    Data

    The data includes the following fields for each institution:

    • Id: A unique identifier assigned to each institution.
    • Region: The federal state in which the institution is located.
    • Name: The full name of the institution.
    • Category: Indicates whether the institution is public or private.
    • Url: The website of the institution.

    Methodology

    The dataset was obtained from the Higher Education Integrated Data System (IPEDS) website [3], which is administered by the National Center for Education Statistics (NCES). NCES serves as the primary federal entity for collecting and analyzing education-related data in the United States. The data was collected on February 2, 2023.

    The initial list of institutions was derived from the IPEDS database using the following criteria: (1) US institutions only, (2) degree-granting institutions, primarily bachelor's or higher, and (3) industry classification, which includes: public 4 - year or above, private not-for-profit 4 years or more, private for-profit 4 years or more, public 2 years, private not-for-profit 2 years, private for-profit 2 years, public less than 2 years, private not-for-profit for-profit less than 2 years and private for-profit less than 2 years.

    The following variables have been added to the list of institutions: Control of the institution, state abbreviation, degree-granting status, Status of the institution, and Institution's internet website address. This resulted in a report with 1,979 institutions.

    The institution's status was labeled with the following values: A (Active), N (New), R (Restored), M (Closed in the current year), C (Combined with another institution), D (Deleted out of business), I (Inactive due to hurricane-related issues), O (Outside IPEDS scope), P (Potential new/add institution), Q (Potential institution reestablishment), W (Potential addition outside IPEDS scope), X ( Potential restoration outside the scope of IPEDS) and G (Perfect Children's Campus).

    A filter was applied to the report to retain only institutions with an A, N, or R status, resulting in 1,978 institutions. Finally, a data cleaning process was applied, which involved removing the whitespace at the beginning and end of cell content and duplicate whitespace. The final data were compiled into the dataset included in this repository.

    Usage

    This data is available under the Creative Commons Zero (CC0) license and can be used for any purpose, including academic research purposes. We encourage the sharing of knowledge and the advancement of research in this field by adhering to open science principles [2].

    If you use this data in your research, please cite the source and include a link to this repository. To properly attribute this data, please use the following DOI: 10.5281/zenodo.7614862

    DOI

    Contribution

    If you have any updates or corrections to the data, please feel free to open a pull request or contact us directly. Let's work together to keep this data accurate and up-to-date.

    Acknowledgment

    We would like to acknowledge the support of the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), within the project "Cybers SeC IP" (NORTE-01-0145-FEDER-000044). This study was also developed as part of the Master in Cybersecurity Program at the Instituto Politécnico de Viana do Castelo, Portugal.

    References

    1. Pending.
    2. S. Bezjak, A. Clyburne-Sherin, P. Conzett, P. Fernandes, E. Görögh, K. Helbig, B. Kramer, I. Labastida, K. Niemeyer, F. Psomopoulos, T. Ross-Hellauer, R. Schneider, J. Tennant, E. Verbakel, H. Brinken, and L. Heller, Open Science Training Handbook. Zenodo, Apr. 2018. [Online]. Available: [https://doi.org/10.5281/zenodo.1212496]
    3. Integrated Postsecondary Education Data System, "Compare Institutions", Fev 2023. [online]. Available: https://nces.ed.gov/ipeds/use-the-data
  2. College Majors and their Graduates

    • kaggle.com
    zip
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). College Majors and their Graduates [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-insights-to-college-majors-and-their
    Explore at:
    zip(39859 bytes)Available download formats
    Dataset updated
    Dec 6, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    College Majors and their Graduates

    Job Opportunities, Salaries and Gender Disparities

    By FiveThirtyEight [source]

    About this dataset

    This repository contains a comprehensive selection of lavish data and processing scripts behind the articles, graphics, and interactive experiences generated by FiveThirtyEight. With this dataset, you'll have the power to explore college programs and their graduates like never before and create stories of your own! Whether you use it to check our work or craft your own powerful visuals, we would absolutely love to know if you found it useful. Under the Creative Commons Attribution 4.0 International License and MIT License respectively, our data is available for anyone who chooses to use it. Let us know how our resources turned out at

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    Research Ideas

    • Create an interactive comparison tool for researching college majors and their earning potential, so that prospective students can make informed decisions about what to study.
    • Analyze the proportions of male and female graduates across different majors to uncover gender disparities in higher education.
    • Explore the correlations between major categories, average salaries earned by graduates from specific major categories, unemployment rates for those with specific majors and more – to identify trends in job opportunities for certain specialties of study

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: majors-list.csv | Column name | Description | |:-------------------|:----------------------------------------------------| | FOD1P | First-level division of the field of study (String) | | Major | The specific major of the field of study (String) | | Major_Category | The broader category of the field of study (String) |

    File: recent-grads.csv | Column name | Description | |:-------------------------|:-------------------------------------------------------------------------------| | Major | The specific major of the field of study (String) | | Rank | The rank of the major in terms of popularity (Integer) | | Major_code | The code associated with the major (Integer) | | Major_category | The category of the major (String) | | Total | The total number of students in the major (Integer) | | Sample_size | The sample size of the major (Integer) | | Men | The number of male students in the major (Integer) | | Women | The number of female students in the major (Integer) | | ShareWomen | The percentage of female students in the major (Float) | | Employed | The number of employed graduates from the major (Integer) | | Full_time | The number of full-time employed graduates from the major (Integer) | | Part_time | The number of part-time employed graduates from the major (Integer) | | Full_time_year_round | The number of full-time year-round employed graduates from the major (Integer) | | Unemployed | The number of unemployed graduates from the major (Integer) | | Unemployment_rate | The unemployment rate of graduates from the major (Float) | | Median | The median salary of graduates from the major (Integer) | | P25th | The 25th percentile salary of graduates from the major (Integer) | | P75th | The 75th percentile salary of graduates from the major (Integer) | | College_jobs | The number of college jobs held by graduates from the major...

  3. National Survey of College Graduates

    • catalog.data.gov
    Updated Mar 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Science and Engineering Statistics (2022). National Survey of College Graduates [Dataset]. https://catalog.data.gov/dataset/national-survey-of-college-graduates
    Explore at:
    Dataset updated
    Mar 5, 2022
    Dataset provided by
    National Center for Science and Engineering Statisticshttp://ncses.nsf.gov/
    Description

    The National Survey of College Graduates is a repeated cross-sectional biennial survey that provides data on the nation's college graduates, with a focus on those in the science and engineering workforce. This survey is a unique source for examining the relationship of degree field and occupation in addition to other characteristics of college-educated individuals, including work activities, salary, and demographic information.

  4. Educational attainment in the U.S. 1960-2022

    • statista.com
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Educational attainment in the U.S. 1960-2022 [Dataset]. https://www.statista.com/statistics/184260/educational-attainment-in-the-us/
    Explore at:
    Dataset updated
    May 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2022, about 37.7 percent of the U.S. population who were aged 25 and above had graduated from college or another higher education institution, a slight decline from 37.9 the previous year. However, this is a significant increase from 1960, when only 7.7 percent of the U.S. population had graduated from college. Demographics Educational attainment varies by gender, location, race, and age throughout the United States. Asian-American and Pacific Islanders had the highest level of education, on average, while Massachusetts and the District of Colombia are areas home to the highest rates of residents with a bachelor’s degree or higher. However, education levels are correlated with wealth. While public education is free up until the 12th grade, the cost of university is out of reach for many Americans, making social mobility increasingly difficult. Earnings White Americans with a professional degree earned the most money on average, compared to other educational levels and races. However, regardless of educational attainment, males typically earned far more on average compared to females. Despite the decreasing wage gap over the years in the country, it remains an issue to this day. Not only is there a large wage gap between males and females, but there is also a large income gap linked to race as well.

  5. USA-National Center for Education Statistics Data

    • kaggle.com
    zip
    Updated Mar 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brijesh Kumar Awasthi (2023). USA-National Center for Education Statistics Data [Dataset]. https://www.kaggle.com/datasets/brijeshawasthi/nces-data/discussion
    Explore at:
    zip(4354 bytes)Available download formats
    Dataset updated
    Mar 27, 2023
    Authors
    Brijesh Kumar Awasthi
    Area covered
    United States
    Description

    NOTE: Data in this table represent the 50 states and the District of Columbia. Data through 1995 are for institutions of higher education, while later data are for degree-granting institutions. Degree-granting institutions grant associate’s or higher degrees and participate in Title IV federal financial aid programs. The degree-granting classification is very similar to the earlier higher education classification, but it includes more 2-year colleges and excludes a few higher education institutions that did not grant degrees. Projections in this table were calculated after the onset of the coronavirus pandemic and take into account the expected impacts of the pandemic. Some data have been revised from previously published figures.

  6. C

    Educational Attainment

    • data.ccrpc.org
    csv
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Champaign County Regional Planning Commission (2024). Educational Attainment [Dataset]. https://data.ccrpc.org/dataset/educational-attainment
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    Champaign County Regional Planning Commission
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Overall educational attainment measures the highest level of education attained by a given individual: for example, an individual counted in the percentage of the measured population with a master’s or professional degree can be assumed to also have a bachelor’s degree and a high school diploma, but they are not counted in the population percentages for those two categories. Overall educational attainment is the broadest education indicator available, providing information about the measured county population as a whole.

    Only members of the population aged 25 and older are included in these educational attainment estimates, sourced from the U.S. Census Bureau American Community Survey (ACS).

    Champaign County has high educational attainment: over 48 percent of the county's population aged 25 or older has a bachelor's degree or graduate or professional degree as their highest level of education. In comparison, the percentage of the population aged 25 or older in the United States and Illinois with a bachelor's degree in 2023 was 21.8% (+/-0.1) and 22.8% (+/-0.2), respectively. The population aged 25 or older in the U.S. and Illinois with a graduate or professional degree in 2022, respectively, was 14.3% (+/-0.1) and 15.5% (+/-0.2).

    Educational attainment data was sourced from the U.S. Census Bureau’s American Community Survey 1-Year Estimates, which are released annually.

    As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

    Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

    For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Educational Attainment for the Population 25 Years and Over.

    Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using data.census.gov; (16 October 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using data.census.gov; (29 September 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using data.census.gov; (6 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using data.census.gov; (4 June 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using data.census.gov; (4 June 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (13 September 2018). U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S1501; generated by CCRPC staff; using American FactFinder; (16 March 2016).

  7. Data from: College Scorecard - U.S Department of Education

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). College Scorecard - U.S Department of Education [Dataset]. https://www.kaggle.com/datasets/thedevastator/u-s-department-of-education-college-scorecard-da
    Explore at:
    zip(1183961 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    The Devastator
    Description

    College Scorecard

    The College Scorecard dataset is provided by the U.S. Department of Education and contains information on nearly every college and university in the United States. The dataset includes data on student loan repayment rates, graduation rates, affordability, earnings after graduation, and more. The goal of this dataset is to help students make informed decisions about their college choice by providing them with clear and concise information about each school's performance

    How to use the dataset

    This dataset can help understand the cost of attending college in the United States, as well as the average debt load for students. It can also be used to compare different schools in terms of their graduation rates and repayment rates

    Columns

    • UNITID: Unit ID for institution
    • INSTNM: Institution name
    • CITY: City
    • STABBR: State
    • ZIP: Zip code
    • OPEID: OPE ID for institution
    • OPEID6: OPE ID for institution (6-digit)
    • ACCREDAGENCY: Accrediting Agency
    • INSTURL: Institution URL
    • NPCURL: Net Price Calculator URL
    • SCH_DEG: Highest degree awarded
    • HCM2: Carnegie Classification 2010:** Basic
    • MAIN: Carnegie Classification 2010:** Main
    • NUMBRANCH: Number of branch campuses
    • PREDDEG: Predominant degree awarded
    • HIGHDEG: Highest degree awarded
    • CONTROL: Control of institution
    • ST_FIPS: State FIPS code
    • REGION: Region
    • LOCALE: Locale code
    • LOCALE2: Locale code (multiple categories per state)
    • CCBASIC: Carnegie Classification 2010:** Basic
    • CCMAIN: Carnegie Classification 2010:** Main
    • CCUGPROF: Carnegie Classification 2010:** Undergraduate Profile
    • CCSIZSET: Carnegie Classification 2010:** Size and Setting
    • HBCU: Historically Black College or University
    • PBI: Predominantly Black Institution
    • ANNHI: Tribal College or University
    • TRIBAL: Tribal College or University (Public)
    • AANAPII: Asian American and Native American Pacific Islander-Serving Institution
    • HSIP: Hispanic-Serving Institution (HSI)
    • NANTI: Native American-Serving Nontribal Institution
    • MENONLY: Men only
    • WOMENONLY: Women only
    • RELAFFIL: Religious affiliation
    • DISTANCEONLY: Distance-only
    • CURROPER: Currently operating
    • VETERAN: Veteran-supportive
    • LIMDEP: Limited-degree-granting
    • HIGHDEG_GRANTED: Highest degree granted
    • PS: Predominantly two-year public
    • UGRD_ENRL_TOTAL: Undergraduate total enrollment
    • GRAD_ENRL_TOTAL: Graduate total enrollment
    • UGRD_ENRL_ORIG_YR2_RT: Undergraduate, first-time, first-year retention rate (%)

    Acknowledgements

    This data was originally collected by the US Department of Education and made available on their website. Thank you to the US Department of Education for making this data available!

  8. College enrollment in public and private institutions in the U.S. 1965-2031

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). College enrollment in public and private institutions in the U.S. 1965-2031 [Dataset]. https://www.statista.com/statistics/183995/us-college-enrollment-and-projections-in-public-and-private-institutions/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    There were approximately 18.58 million college students in the U.S. in 2022, with around 13.49 million enrolled in public colleges and a further 5.09 million students enrolled in private colleges. The figures are projected to remain relatively constant over the next few years.

    What is the most expensive college in the U.S.? The overall number of higher education institutions in the U.S. totals around 4,000, and California is the state with the most. One important factor that students – and their parents – must consider before choosing a college is cost. With annual expenses totaling almost 78,000 U.S. dollars, Harvey Mudd College in California was the most expensive college for the 2021-2022 academic year. There are three major costs of college: tuition, room, and board. The difference in on-campus and off-campus accommodation costs is often negligible, but they can change greatly depending on the college town.

    The differences between public and private colleges Public colleges, also called state colleges, are mostly funded by state governments. Private colleges, on the other hand, are not funded by the government but by private donors and endowments. Typically, private institutions are  much more expensive. Public colleges tend to offer different tuition fees for students based on whether they live in-state or out-of-state, while private colleges have the same tuition cost for every student.

  9. D

    Educational Attainment

    • catalog.dvrpc.org
    csv
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DVRPC (2025). Educational Attainment [Dataset]. https://catalog.dvrpc.org/dataset/educational-attainment
    Explore at:
    csv(355321), csv(5066), csv(2766), csv(12399), csv(2647), csv(6460), csv(1566), csv(233799)Available download formats
    Dataset updated
    Mar 17, 2025
    Dataset authored and provided by
    DVRPC
    License

    https://catalog.dvrpc.org/dvrpc_data_license.htmlhttps://catalog.dvrpc.org/dvrpc_data_license.html

    Description

    As part of the American Community Survey (ACS), the U.S. Census Bureau collects information regarding respondents' educational attainment. Educational attainment refers to the highest level of education that all individuals age 25 and older have completed. Response categories include no schooling completed; nursery school, grades 1 through 11; 12th grade but no diploma; regular high school diploma; GED or alternative credential; some college credit, but less than one year of college; one or more years of college credit, no degree; associate's degree; bachelor's degree; master's degree, professional degree beyond bachelor's degree; and doctorate degree. Data from the 2000 Decennial Census is also summarized.

  10. Data from: College Completion

    • kaggle.com
    zip
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laila Qadir Musib (2024). College Completion [Dataset]. https://www.kaggle.com/datasets/leilahasan/college-completion
    Explore at:
    zip(10701318 bytes)Available download formats
    Dataset updated
    Feb 22, 2024
    Authors
    Laila Qadir Musib
    Description

    DESCRIPTION College completion data from 3,800 degree-granting institutions in the United States SUMMARY Source of the data These data were pulled from the College Completion microsite produced by The Chronicle of Higher Education with support from the Bill & Melinda Gates Foundation. Its goal is to share data on completion rates in American higher education in a visually stimulating way. [Their] hope is that ... you will find your own stories in the statistics and use the tools [they] provide to download data files; share charts through your own presentations; and comment, start conversations, or provide tips about this important topic.

    Note: This text was adapted from the College Completion website's About page copy. Please visit http://collegecompletion.chronicle.com/about/ for more info

  11. Complete Education Details (116th U.S. Congress)

    • kaggle.com
    zip
    Updated Jun 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pmohun (2020). Complete Education Details (116th U.S. Congress) [Dataset]. https://www.kaggle.com/philmohun/complete-education-details-116th-us-congress
    Explore at:
    zip(23724 bytes)Available download formats
    Dataset updated
    Jun 6, 2020
    Authors
    pmohun
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    United States
    Description

    Context

    This dataset contains complete education details for members of the 116th United States Congress, including universities, degrees, and political affiliations. I hope that this dataset is helpful for anyone who may wish to further investigate the correlation of education and academic credentials with policy decisions made by members of our government.

    Content

    Data was gathered by reviewing Wikipedia pages for members of the U.S. 116th Congress via: https://en.wikipedia.org/wiki/116th_United_States_Congress

    Acknowledgements

    Thank you Wikipedia. You are a modern miracle.

    Inspiration

    This dataset can be used to answer questions like: - Is political affiliation correlated with education? - What is the most common degree type for U.S. Senators? - What percentage of U.S. Congressmen dropped out of college? - Which college has the most representation in the House of Representatives? - What percentage of Congressmen are scientists?

    With some creative co-mingling, this dataset can be used to supplement research questions like: - Are policy decisions correlated with education? - Are there relationships associated with college affiliations and voting in Congress? - Can we find a relationship between hot button topics and education? - How is public sentiment influenced by education level?

  12. C

    Pittsburgh American Community Survey 2015, School Enrollment

    • data.wprdc.org
    • datasets.ai
    • +2more
    csv, txt
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Pittsburgh (2024). Pittsburgh American Community Survey 2015, School Enrollment [Dataset]. https://data.wprdc.org/dataset/pittsburgh-american-community-survey-2015-school-enrollment
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset authored and provided by
    City of Pittsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pittsburgh
    Description

    School enrollment data are used to assess the socioeconomic condition of school-age children. Government agencies also require these data for funding allocations and program planning and implementation.

    Data on school enrollment and grade or level attending were derived from answers to Question 10 in the 2015 American Community Survey (ACS). People were classified as enrolled in school if they were attending a public or private school or college at any time during the 3 months prior to the time of interview. The question included instructions to “include only nursery or preschool, kindergarten, elementary school, home school, and schooling which leads to a high school diploma, or a college degree.” Respondents who did not answer the enrollment question were assigned the enrollment status and type of school of a person with the same age, sex, race, and Hispanic or Latino origin whose residence was in the same or nearby area.

    School enrollment is only recorded if the schooling advances a person toward an elementary school certificate, a high school diploma, or a college, university, or professional school (such as law or medicine) degree. Tutoring or correspondence schools are included if credit can be obtained from a public or private school or college. People enrolled in “vocational, technical, or business school” such as post secondary vocational, trade, hospital school, and on job training were not reported as enrolled in school. Field interviewers were instructed to classify individuals who were home schooled as enrolled in private school. The guide sent out with the mail questionnaire includes instructions for how to classify home schoolers.

    Enrolled in Public and Private School – Includes people who attended school in the reference period and indicated they were enrolled by marking one of the questionnaire categories for “public school, public college,” or “private school, private college, home school.” The instruction guide defines a public school as “any school or college controlled and supported primarily by a local, county, state, or federal government.” Private schools are defined as schools supported and controlled primarily by religious organizations or other private groups. Home schools are defined as “parental-guided education outside of public or private school for grades 1-12.” Respondents who marked both the “public” and “private” boxes are edited to the first entry, “public.”

    Grade in Which Enrolled – From 1999-2007, in the ACS, people reported to be enrolled in “public school, public college” or “private school, private college” were classified by grade or level according to responses to Question 10b, “What grade or level was this person attending?” Seven levels were identified: “nursery school, preschool;” “kindergarten;” elementary “grade 1 to grade 4” or “grade 5 to grade 8;” high school “grade 9 to grade 12;” “college undergraduate years (freshman to senior);” and “graduate or professional school (for example: medical, dental, or law school).”

    In 2008, the school enrollment questions had several changes. “Home school” was explicitly included in the “private school, private college” category. For question 10b the categories changed to the following “Nursery school, preschool,” “Kindergarten,” “Grade 1 through grade 12,” “College undergraduate years (freshman to senior),” “Graduate or professional school beyond a bachelor’s degree (for example: MA or PhD program, or medical or law school).” The survey question allowed a write-in for the grades enrolled from 1-12.

    Question/Concept History – Since 1999, the ACS enrollment status question (Question 10a) refers to “regular school or college,” while the 1996-1998 ACS did not restrict reporting to “regular” school, and contained an additional category for the “vocational, technical or business school.” The 1996-1998 ACS used the educational attainment question to estimate level of enrollment for those reported to be enrolled in school, and had a single year write-in for the attainment of grades 1 through 11. Grade levels estimated using the attainment question were not consistent with other estimates, so a new question specifically asking grade or level of enrollment was added starting with the 1999 ACS questionnaire.

    Limitation of the Data – Beginning in 2006, the population universe in the ACS includes people living in group quarters. Data users may see slight differences in levels of school enrollment in any given geographic area due to the inclusion of this population. The extent of this difference, if any, depends on the type of group quarters present and whether the group quarters population makes up a large proportion of the total population. For example, in areas that are home to several colleges and universities, the percent of individuals 18 to 24 who were enrolled in college or graduate school would increase, as people living in college dormitories are now included in the universe.

  13. Cost of International Education

    • kaggle.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Shamim (2025). Cost of International Education [Dataset]. https://www.kaggle.com/datasets/adilshamim8/cost-of-international-education
    Explore at:
    zip(18950 bytes)Available download formats
    Dataset updated
    May 7, 2025
    Authors
    Adil Shamim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Cost of International Education dataset compiles detailed financial information for students pursuing higher education abroad. It covers multiple countries, cities, and universities around the world, capturing the full tuition and living expenses spectrum alongside key ancillary costs. With standardized fields such as tuition in USD, living-cost indices, rent, visa fees, insurance, and up-to-date exchange rates, it enables comparative analysis across programs, degree levels, and geographies. Whether you’re a prospective international student mapping out budgets, an educational consultant advising on affordability, or a researcher studying global education economics, this dataset offers a comprehensive foundation for data-driven insights.

    Description

    ColumnTypeDescription
    CountrystringISO country name where the university is located (e.g., “Germany”, “Australia”).
    CitystringCity in which the institution sits (e.g., “Munich”, “Melbourne”).
    UniversitystringOfficial name of the higher-education institution (e.g., “Technical University of Munich”).
    ProgramstringSpecific course or major (e.g., “Master of Computer Science”, “MBA”).
    LevelstringDegree level of the program: “Undergraduate”, “Master’s”, “PhD”, or other certifications.
    Duration_YearsintegerLength of the program in years (e.g., 2 for a typical Master’s).
    Tuition_USDnumericTotal program tuition cost, converted into U.S. dollars for ease of comparison.
    Living_Cost_IndexnumericA normalized index (often based on global city indices) reflecting relative day-to-day living expenses (food, transport, utilities).
    Rent_USDnumericAverage monthly student accommodation rent in U.S. dollars.
    Visa_Fee_USDnumericOne-time visa application fee payable by international students, in U.S. dollars.
    Insurance_USDnumericAnnual health or student insurance cost in U.S. dollars, as required by many host countries.
    Exchange_RatenumericLocal currency units per U.S. dollar at the time of data collection—vital for currency conversion and trend analysis if rates fluctuate.

    Potential Uses

    • Budget Planning Prospective students can filter by country, program level, or university to forecast total expenses and compare across destinations.
    • Policy Analysis Educational policymakers and NGOs can assess the affordability of international education and design support programs.
    • Economic Research Economists can correlate living-cost indices and tuition levels with enrollment rates or student demographics.
    • University Benchmarking Institutions can benchmark their fees and ancillary costs against peer universities worldwide.

    Notes on Data Collection & Quality

    • Currency Conversions All monetary values are unified to USD using contemporaneous exchange rates to facilitate direct comparison.
    • Living Cost Index Derived from reputable city-index publications (e.g., Numbeo, Mercer) to standardize disparate cost-of-living metrics.
    • Data Currency Exchange rates and fee schedules should be periodically updated to reflect market fluctuations and policy changes.

    Feel free to explore, visualize, and extend this dataset for deeper insights into the true cost of studying abroad!

  14. u

    Education by County 2015

    • gstore.unm.edu
    Updated Mar 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Education by County 2015 [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/4a0337cb-1859-4d18-b2a5-d83c026eed3a/metadata/ISO-19115:2003.html
    Explore at:
    Dataset updated
    Mar 6, 2020
    Time period covered
    2015
    Area covered
    West Bound -109.05017 East Bound -103.00196 North Bound 37.000293 South Bound 31.33217
    Description

    A broad and generalized selection of 2011-2015 US Census Bureau 2015 5-year American Community Survey education data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of educational attaiment by grade level and sex (for all persons 25 years and older), plus enrollment estimates at key educational levels (for the universe of all persons 3+ years old). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.

  15. Indian International Students in the US

    • kaggle.com
    zip
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pushkar Joshi (2025). Indian International Students in the US [Dataset]. https://www.kaggle.com/datasets/pushkarjoshi17/indian-international-students-in-the-us
    Explore at:
    zip(225937 bytes)Available download formats
    Dataset updated
    Apr 28, 2025
    Authors
    Pushkar Joshi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧑‍🎓 Indian International Students in the US — Education, Jobs, and Visas 📚💼

    Dataset Overview

    This dataset provides a detailed and realistic simulation of Indian international students studying in the United States, their educational paths, job outcomes after graduation, university information, and visa approval statistics.

    It can be used for:

    • Education data analysis 📊
    • Employment trends for international graduates 👩‍💻
    • Machine learning projects (classification, regression) 🤖
    • Visa trend analysis 📈
    • Career planning studies for international students 🎯

    📂 Files Included | File Name| Description | | --- | --- | | indian_international_students_us.csv | Profiles of 10,000 Indian international students, including university, major, degree level, and study status. | |job_outcomes_indian_students_us.csv | Job outcomes for students who graduated, including job title, company, salary, visa status, and time to first job. | |universities_info_us.csv | Information about major US universities, including acceptance rates, GRE/TOEFL averages, and international student percentages. | | visa_approval_stats.csv | Yearly visa approval and denial rates for F1, OPT, and H1B visa types from 2015 to 2023. |

    ✨ Potential Project Ideas 1. Predict job offer chances based on major, degree, and university. 2. Analyze salary distributions by major, company, and visa status. 3. Visualize visa approval trends over time. 4. Build a career advisory tool for international students.

    ✨ SQL Potential Project 1. List all students studying in "Computer Science" major. 2. Count how many students are currently enrolled vs graduated. 3. Find top 5 universities with the highest number of students. 4. Get the list of all students whose degree level is "Masters". 5. Find average salary of students who received a job offer. 6. List all companies that hired at least one student.

    1. Find universities with an acceptance rate below 20%.
    2. Calculate the percentage of students in each degree level (Bachelors, Masters, PhD).
    3. Find top 10 job titles offered to Indian students.
    4. List universities where average GRE score is above 320.
    5. Rank students based on salary who got a job.
    6. Find visa approval rate for each visa type (F1, OPT, H1B) over the years.

    7. Build a report showing: University Name Number of students Number of students who got jobs Average salary Job offer rate (%)

    8. Identify majors with the highest average salaries after graduation.

    9. Compare visa approval trends: How have F1, OPT, and H1B approval rates changed from 2015 to 2023?

    10. Create a view showing: Students with highest probability of getting a job based on major, university, and degree level.

    11. Predict (with SQL logic): If a new student graduates from [University X] with [Major Y] and [Degree Level Z], what is their expected salary range?

    12. Cohort Analysis: Analyze students who graduated in a particular year, how many got jobs within 6 months.

    ⚡ Important Note This dataset is synthetic but designed to be realistic based on trends among Indian students studying abroad. No real personal information is included. Great for educational, research, and portfolio purposes.

    🔖 Acknowledgment Generated by Pushkar Joshi using simulated data sources. Inspired by real-world patterns and publicly available educational statistics.

    🏷️ Suggested Tags

    education, #students, #international-students, #jobs, #visas, #synthetic-data, #data-science, #kaggle-datasets

  16. u

    American Community Survey

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Mar 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/46695c3c-4b5c-499e-8cd4-3171623d053a/metadata/FGDC-STD-001-1998.html
    Explore at:
    gml(5), geojson(5), shp(5), zip(1), json(5), kml(5), xls(5), csv(5)Available download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    2017
    Area covered
    New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
    Description

    A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey education data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of educational attaiment by grade level and sex (for all persons 25 years and older), plus enrollment estimates at key educational levels (for the universe of all persons 3+ years old). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.

  17. u

    American Community Survey

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Mar 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/fbe60d76-af16-427c-a731-897b0525a23d/metadata/FGDC-STD-001-1998.html
    Explore at:
    csv(5), xls(5), zip(1), json(5), shp(5), geojson(5), gml(5), kml(5)Available download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    2018
    Area covered
    New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
    Description

    A broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey education data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of educational attaiment by grade level and sex (for all persons 25 years and older), plus enrollment estimates at key educational levels (for the universe of all persons 3+ years old). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.

  18. USStateEducationAnalysisForTechProductLaunch

    • kaggle.com
    zip
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arnab Gupta (2025). USStateEducationAnalysisForTechProductLaunch [Dataset]. https://www.kaggle.com/datasets/itzivision/usstateeducationanalysisfortechproductlaunch/code
    Explore at:
    zip(53545 bytes)Available download formats
    Dataset updated
    Aug 7, 2025
    Authors
    Arnab Gupta
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    US State Education Analysis for Tech Product Launch

    About This Dataset

    This comprehensive dataset provides detailed educational attainment and demographic analysis across all 50 US states from 2021-2023, specifically designed for tech companies planning strategic market entry and product launch decisions.

    Dataset Overview

    • 150 rows of data (50 states × 3 years)
    • 17 columns of educational, demographic, and economic indicators
    • Complete coverage of all US states from 2021-2023
    • Ready-to-analyze format with calculated percentages and rankings

    Key Features

    🎯 Strategic Market Intelligence

    • Educational attainment levels by degree type (Bachelor's, Master's, Professional, Doctoral)
    • Calculated education scores and state rankings for quick market prioritization
    • Median household income data for purchasing power assessment

    📊 Comprehensive Demographics

    • Population data for adults 25+ (primary tech consumer demographic)
    • Household count data for market sizing
    • College graduate percentages for targeted marketing

    🔢 Advanced Analytics Ready

    • Pre-calculated composite education scores
    • State rankings based on education levels
    • Percentage breakdowns for immediate insights

    Column Definitions

    Column NameData TypeDescriptionExample Value
    NAMEStringFull US state name"Massachusetts"
    total_population_25plusIntegerTotal population aged 25 and above4,975,152
    bachelors_degreeIntegerNumber of individuals with bachelor's degrees1,261,847
    masters_degreeIntegerNumber of individuals with master's degrees788,243
    professional_degreeIntegerNumber of individuals with professional degrees (JD, MD, etc.)157,762
    doctoral_degreeIntegerNumber of individuals with doctoral degrees (PhD, EdD, etc.)169,357
    median_household_incomeIntegerMedian household income in USD$99,858
    total_householdsFloatTotal number of households (in millions)2.41
    stateIntegerNumeric state identifier (1-50)25
    yearIntegerData collection year2023
    college_graduatesIntegerTotal college graduates (bachelor's + advanced degrees)2,377,209
    college_graduate_percentageFloatPercentage of population with college degrees47.78%
    graduate_degree_holdersIntegerTotal with master's, professional, or doctoral degrees1,115,362
    graduate_degree_percentageFloatPercentage with graduate-level degrees22.42%
    advanced_degree_percentageFloatPercentage with professional or doctoral degrees3.40%
    education_scoreFloatComposite education ranking score28.76
    education_rankIntegerState ranking based on education score (1-50, 1=highest)1

    Use Cases

    🚀 Tech Product Launches

    • Identify states with highest concentrations of educated early adopters
    • Prioritize markets based on education levels and income
    • Size potential customer segments by state

    📈 Market Research & Analysis

    • Compare educational demographics across regions
    • Analyze trends in educational attainment over time
    • Correlate education levels with income potential

    🎯 Customer Segmentation

    • Target high-value customer segments (graduate degree holders)
    • Develop region-specific marketing strategies
    • Plan B2B tech sales territories

    📊 Business Intelligence

    • Regional expansion planning
    • Competitive market analysis
    • Investment and resource allocation decisions

    Data Quality & Sources

    • Primary Sources: US Census Bureau American Community Survey (ACS), Bureau of Labor Statistics
    • Data Validation: Cross-referenced against multiple official sources
    • Calculation Methodology: All percentages and scores calculated using consistent formulas
    • Update Frequency: Annual updates as new official data becomes available

    Sample Insights

    The dataset reveals that Massachusetts consistently ranks #1 in education metrics with: - 47.78% college graduation rate (2023) - 22.42% graduate degree holders - $99,858 median household income - Education score of 28.76

    Perfect for identifying premium tech markets and highly-educated consumer bases for sophisticated technology products.

    This dataset is ideal for data scientists, market researchers, business analysts, and tech companies looking to make data-driven decisions about market entry, customer targeting, and regional strategy.

  19. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Silicon Orchard Lab, Bangladesh
    Independent University, Bangladesh
    University of Memphis, USA
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  20. Complete Education Details (117th U.S. Senate)

    • kaggle.com
    zip
    Updated Apr 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Shl (2021). Complete Education Details (117th U.S. Senate) [Dataset]. https://www.kaggle.com/adamshl/complete-education-details-117th-us-congress
    Explore at:
    zip(212143 bytes)Available download formats
    Dataset updated
    Apr 19, 2021
    Authors
    Adam Shl
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    United States
    Description

    Dataset

    This dataset was created by Adam Shl

    Released under Database: Open Database, Contents: Database Contents

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jackson Júnior (2023). Higher Education Institutions in the USA [Dataset]. https://www.kaggle.com/datasets/jacksonbarreto/higher-education-institutions-in-the-usa/data
Organization logo

Higher Education Institutions in the USA

Public and Private Universities' Information

Explore at:
zip(35907 bytes)Available download formats
Dataset updated
Apr 8, 2023
Authors
Jackson Júnior
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
United States
Description

Higher Education Institutions in the United States of America Dataset

This repository contains a dataset of higher education institutions in the United States of America. This dataset was compiled in response to a cybersecurity research of American higher education institutions' websites [1]. The data is being made publicly available to promote open science principles [2].

Data

The data includes the following fields for each institution:

  • Id: A unique identifier assigned to each institution.
  • Region: The federal state in which the institution is located.
  • Name: The full name of the institution.
  • Category: Indicates whether the institution is public or private.
  • Url: The website of the institution.

Methodology

The dataset was obtained from the Higher Education Integrated Data System (IPEDS) website [3], which is administered by the National Center for Education Statistics (NCES). NCES serves as the primary federal entity for collecting and analyzing education-related data in the United States. The data was collected on February 2, 2023.

The initial list of institutions was derived from the IPEDS database using the following criteria: (1) US institutions only, (2) degree-granting institutions, primarily bachelor's or higher, and (3) industry classification, which includes: public 4 - year or above, private not-for-profit 4 years or more, private for-profit 4 years or more, public 2 years, private not-for-profit 2 years, private for-profit 2 years, public less than 2 years, private not-for-profit for-profit less than 2 years and private for-profit less than 2 years.

The following variables have been added to the list of institutions: Control of the institution, state abbreviation, degree-granting status, Status of the institution, and Institution's internet website address. This resulted in a report with 1,979 institutions.

The institution's status was labeled with the following values: A (Active), N (New), R (Restored), M (Closed in the current year), C (Combined with another institution), D (Deleted out of business), I (Inactive due to hurricane-related issues), O (Outside IPEDS scope), P (Potential new/add institution), Q (Potential institution reestablishment), W (Potential addition outside IPEDS scope), X ( Potential restoration outside the scope of IPEDS) and G (Perfect Children's Campus).

A filter was applied to the report to retain only institutions with an A, N, or R status, resulting in 1,978 institutions. Finally, a data cleaning process was applied, which involved removing the whitespace at the beginning and end of cell content and duplicate whitespace. The final data were compiled into the dataset included in this repository.

Usage

This data is available under the Creative Commons Zero (CC0) license and can be used for any purpose, including academic research purposes. We encourage the sharing of knowledge and the advancement of research in this field by adhering to open science principles [2].

If you use this data in your research, please cite the source and include a link to this repository. To properly attribute this data, please use the following DOI: 10.5281/zenodo.7614862

DOI

Contribution

If you have any updates or corrections to the data, please feel free to open a pull request or contact us directly. Let's work together to keep this data accurate and up-to-date.

Acknowledgment

We would like to acknowledge the support of the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), within the project "Cybers SeC IP" (NORTE-01-0145-FEDER-000044). This study was also developed as part of the Master in Cybersecurity Program at the Instituto Politécnico de Viana do Castelo, Portugal.

References

  1. Pending.
  2. S. Bezjak, A. Clyburne-Sherin, P. Conzett, P. Fernandes, E. Görögh, K. Helbig, B. Kramer, I. Labastida, K. Niemeyer, F. Psomopoulos, T. Ross-Hellauer, R. Schneider, J. Tennant, E. Verbakel, H. Brinken, and L. Heller, Open Science Training Handbook. Zenodo, Apr. 2018. [Online]. Available: [https://doi.org/10.5281/zenodo.1212496]
  3. Integrated Postsecondary Education Data System, "Compare Institutions", Fev 2023. [online]. Available: https://nces.ed.gov/ipeds/use-the-data
Search
Clear search
Close search
Google apps
Main menu