Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Artificial intelligence is being utilized in many domains as of late, and the legal system is no exception. However, as it stands now, the number of well-annotated datasets pertaining to legal documents from the Supreme Court of the United States (SCOTUS) is very limited for public use. Even though the Supreme Court rulings are public domain knowledge, trying to do meaningful work with them becomes a much greater task due to the need to manually gather and process that data from scratch each time. Hence, our goal is to create a high-quality dataset of SCOTUS court cases so that they may be readily used in natural language processing (NLP) research and other data-driven applications. Additionally, recent advances in NLP provide us with the tools to build predictive models that can be used to reveal patterns that influence court decisions. By using advanced NLP algorithms to analyze previous court cases, the trained models are able to predict and classify a court's judgment given the case's facts from the plaintiff and the defendant in textual format; in other words, the model is emulating a human jury by generating a final verdict
The dataset contains 3304 cases from the Supreme Court of the United States from 1955 to 2021. Each case has the case's identifiers as well as the facts of the case and the decision outcome. Other related datasets rarely included the facts of the case which could prove to be helpful in natural language processing applications. One potential use case of this dataset is determining the outcome of a case using its facts.
Target Variable: First Party Winner, if true means that the first party won, and if false it means that the second party won. Use NLP techniques to build features out of facts column.
research team's jupyter notebook: click here
Mohammad Alali, Shaayan Syed, Mohammed Alsayed, Smit Patel, Hemanth Bodala
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Supreme Court database is the definitive source for researchers, students, journalists, and citizens interested in the United States Supreme Court. The database contains more than two hundred variables regarding each case decided by the Court between the 1946 and 2015 terms. Examples include the identity of the court whose decision the Supreme Court reviewed, the parties to the suit, the legal provisions considered in the case, and the votes of the Justices. The database codebook is available here.
The database was compiled by Professor Spaeth of Washington University Law and funded with a grant from the National Science Foundation.
Facebook
TwitterInvestigator(s): Harold J. Spaeth, James L. Gibson, Michigan State University This data collection encompasses all aspects of United States Supreme Court decision-making from the beginning of the Warren Court in 1953 up to the completion of the 1995 term of the Rehnquist Court on July 1, 1996, including any decisions made afterward but before the start of the 1996 term on October 7, 1996. In this collection, distinct aspects of the court's decisions are covered by six types of variables: (1) identification variables including case citation, docket number, unit of analysis, and number of records per unit of analysis, (2) background variables offering information on origin of case, source of case, reason for granting cert, parties to the case, direction of the lower court's decision, and manner in which the Court takes jurisdiction, (3) chronological variables covering date of term of court, chief justice, and natural court, (4) substantive variables including multiple legal provisions, authority for decision, issue, issue areas, and direction of decision, (5) outcome variables supplying information on form of decision, disposition of case, winning party, declaration of unconstitutionality, and multiple memorandum decisions, and (6) voting and opinion variables pertaining to the vote in the case and to the direction of the individual justices' votes.Years Produced: Annually
Facebook
TwitterThis data collection is an expanded version of UNITED STATES SUPREME COURT JUDICIAL DATABASE, 1953-1996 TERMS (ICPSR 9422), encompassing all aspects of United States Supreme Court decision-making from the beginning of the Vinson Court in 1946 to the end of the Warren Court in 1968. Two major differences distinguish the expanded version of the database from the original collection: the addition of data on the decisions of the Vinson Court, and the inclusion of the conference votes of the Vinson and Warren courts. Whereas the original collection contained only the vote as reported in the UNITED STATES SUPREME COURT REPORTS, the expanded database includes all votes cast in conference. Concomitant with the expansion of the database is a shift in its basic unit of analysis. The original collection contained every case in which at least one justice wrote an opinion, and cases without opinions were excluded. This version includes every case in which the Court cast a conference vote, with and without opinions. The justices cast many more votes than they wrote opinions, and hence, the number of Warren Court records in this version increased by more than a factor of two over the original version. As in the original collection, distinct aspects of the Court's decisions are covered by six types of variables: (1) identification variables including case citation, docket number, unit of analysis, and number of records per unit of analysis, (2) background variables offering information on origin of case, source of case, reason for granting cert, parties to the case, direction of the lower court's decision, and manner in which the Court takes jurisdiction, (3) chronological variables covering date of term of court, chief justice, and natural court, (4) substantive variables including multiple legal provisions, authority for decision, issue, issue areas, and direction of decision, (5) outcome variables supplying information on form of decision, disposition of case, winning party, declaration of unconstitutionality, and multiple memorandum decisions, and (6) voting and opinion variables pertaining to the vote in the case and to the direction of the individual justices' votes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Pendency of Court Cases in India
India has the largest number of pending court cases in the world. Many judges and government officials have said that the pendency of cases is the biggest challenge before Indian judiciary. According to a 2018 NITI Aayog strategy paper, at the then-prevailing rate of disposal of cases in our courts, it would take more than 324 years to clear the backlog.
Pendency of court cases in India is the delay in the disposal of cases (lawsuits) to provide justice to the aggrieved person or organization by judicial courts at all levels. The judiciary in India works in hierarchy at three levels - supreme court, state or high courts, and district courts. The court cases is categorized into two types - civil and criminal. In 2022, the total number of pending cases of all types and at all levels rose to 50 million or 5 crores, including over 169,000 court cases pending for more than 30 years in district and high courts.
Causes of pendency
State-wise statistics
Courthall shortfall is calculated as lack of courthalls as percentage of the total sanctioned strength of the judges. A negative percentage means courthalls are in excess. Case clearance rate (CCR) is cases disposed in a year as a percentage of new cases filed in the same year. CCR of less than 100 means case pendency will increase, CCR equal to 100 means case pendency will remain same, CCR of more than 100 means case pendency will decrease. NA: Not Available. (Source: India Justice Report, 2022)
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This dataset presents detailed statistics on court case disposals in India, categorized by the case type and the registration status. It includes data categorized by age of the case at the time of disposal. It captures the absolute number of cases disposed within various time brackets, ranging from within 1 year to more than 21 years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information about appeal cases heard at the Supreme Court of Nigeria (SCN) between the years 1962 to 2022. The dataset was extracted from case files that were provided by The Prison Law Pavillion; a data archiving firm in Nigeria. The dataset originally consisted of documentation of the various appeal cases alongside the outcome of the judgment of the SCN. Feature extraction techniques were used to generate a structured dataset containing information about a number of annotated features. Some of the features were stored as string values while some of the features were stored as numeric values. The dataset consists of information about 14 features including the outcome of the judgment. 13 features are the input variables among which 4 are stored as strings while the remaining 9 were stored as numeric values. Missing values among the numeric values were represented using the value -1. Unsupervised and Supervised machine learning algorithms can be applied to the dataset for the purpose of extracting important information required for gaining a better understanding of the relationship that exists among the features and with respect to predicting the target class which is the outcome of the SCN judgment.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This dataset contains written opinions from the 11 numbered Courts of Appeals and the DC Circuit Court of Appeals (not including the Federal Circuit), as well as the SCOTUS. It also contains metadata pertaining to each opinion, such as author, year, etc. It also contains the processed outputs of the rDIM model (Gerow et. al. 2018) pertaining to the experiments performed in our paper. These results contain the assigned influence and topic distribution for each case. Methods The data was curated from four main sources:
Harvard Caselaw Access Project case.law Federal Judicial Center (FJC) list of judges A list of federal appeals court cases selected for review, as well as their corresponding SCOTUS opinions from Livermore et. al. "The Supreme Court and the Judicial Genre" The Supreme Court Database (SCDB)
The opinions were cleaned using standard text cleaning techniques. The authors were deduced by performing regular expression matches between the noisy Caselaw author field and a list of judges from the FJC.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
📌For most researchers and all applications which do not require accessing the judgment texts, we recommend using the standard .csv export.
📌Here, along with a csv file and a pdf file, it has been uploaded for guidance and more details about the dataset and columns.
Description
The Swiss Federal Supreme Court Dataset (SCD) provides a record of all 118,443 cases decided by the Swiss Federal Supreme Court between 2007 and September 2023. The SCD includes 31 variables that document basic case information, the court composition, the area of law, information about the appealed judgment, the parties, the case outcome, and about citations and publication status.
The dataset can be used as data infrastructure for both qualitative and quantitative analysis of Federal Supreme Court jurisprudence. It is generated using a fully automated pipeline and will be updated quarterly until at least 2025 to include the latest judgments and possible expansions.
Number of instances: 118443
Number of attributes: 31
A brief description of the columns:
✅docref: Reference to the document
✅url: Web address or link associated with the document
✅date: Date related to the document
✅year: Year related to the document
✅proc_type: Type of judicial process
✅merged_cases: Number of merged cases
✅division: Division or department division
✅division_type: Type of division or department division
✅n_judges: Number of judges
✅language: Document language
✅length: Length of the document
✅area_general: General topic
✅area_intermediate: Intermediate topic
✅area_detailed: Detailed topic
✅topic: Topic
✅issue: Issue
✅source_date: Source date
✅source_canton: Source canton
✅proc_duration: Duration of the judicial process
✅app_class: Applicant class
✅app_represented: Applicant represented type
✅resp_class: Respondent class
✅resp_represented: Respondent represented type
✅outcome: Outcome
✅outcome_binary: Binary outcome
✅cited_bger: Citation to the Swiss Federal Court
✅n_cited_bger: Number of citations to the Swiss Federal Court
✅cited_bge: Citation to the Swiss Civil Court
✅n_cited_bge: Number of citations to the Swiss Civil Court
✅leading_case: Precedent case
✅doi_version: Digital Object Identifier (DOI) version
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/9422/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/9422/terms
This data collection encompasses all aspects of United States Supreme Court decision-making from the beginning of the Warren Court in 1953 to the completion of the most recent term of the Rehnquist Court. In this collection, distinct aspects of the Court's decisions are covered by six types of variables: (1) identification variables including citations and docket numbers, (2) background variables offering information on how the Court took jurisdiction, origin and source of case, and the reason the Court granted cert, (3) chronological variables covering date of decision, Court term, and natural court, (4) substantive variables including legal provisions, issues, and direction of decision, (5) outcome variables supplying information on disposition of case, winning party, formal alteration of precedent, and declaration of unconstitutionality, and (6) voting and opinion variables pertaining to how individual justices voted, their opinions and interagreements, and the direction of their votes.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Indian Supreme Court Judgements Chunked
Executive Summary
The dataset aims to address the chronic backlog in the Indian judiciary system, particularly in the Supreme Court, by creating a dataset optimized for legal language models (LLMs). The dataset will consist of pre-processed, chunked, and embedded textual data derived from the Supreme Court's judgment PDFs.
Problem and Importance - Motivation
Indian courts are overwhelmed with pending cases, with the… See the full description on the dataset page: https://huggingface.co/datasets/vihaannnn/Indian-Supreme-Court-Judgements-Chunked.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a comprehensive collection of Supreme Court of India judgments from 1950 to early 2025, covering approximately 98% of the documents available on the Indian Kanoon website under the Supreme Court judgments section.
26000 of PDF files of judgments spanning over 75 years.
Rich legal language suitable for Legal NLP, RAG systems, Summarization, Classification, and Legal Information Retrieval.
Each file represents an official judgment document delivered by the Supreme Court of India.
Source: Scraped and compiled from Indian Kanoon.
Coverage: ~98% of available Supreme Court judgments available on Indian Kanoon website as of early 2025.
Legal Language Modeling and Pretraining
Retrieval-Augmented Generation (RAG) for Law
Legal Document Summarization
Case Similarity & Legal Analytics
Timeline-based legal precedent analysis
Legal AI researchers
Law and public policy scholars
NLP practitioners working on domain-specific language models
Developers building legal chatbots or legal tech products
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Theories of the relationship between the Supreme Court and the public assume that the public can potentially monitor the Court's behavior. We seek to measure the impact of Court decisions on public awareness of its cases. Public awareness of cases varies according to individual differences: more educated, knowledgeable, and informationally-motivated citizens are more likely to report awareness. Further, decision announcements increase awareness more generally, especially in cases of moderate salience. The results suggest that while the public may eventually respond to the behavior of national institutions, this response is likely first filtered through an elite subset of the population.
Facebook
TwitterWe address a fundamental question in judicial politics: other things being equal, do African American judges behave differently than white judges? Many presume that white judges differ from their minority counterparts in terms of sentencing, deliberation, and propensity to overturn decisions. However, to date, there is little empirical evidence on whether there are systematic differences in behavior between these judges. Here, we utilize the newly created judge-level U.S. State Supreme Court Database to assess whether judicial decisionmaking is affected by the race of the judge. Looking at all criminal cases decided by U.S. state supreme court judges from 1995-1998, we find evidence of differences between white and non-white judges, but only in states where there is no intermediate appellate court. This suggests the effects of race on judicial decisionmaking are conditioned by the institutional structure of the court system.
Facebook
TwitterSupreme Court of Pakistan Judgments DatasetThis dataset contains almost 1200 judgments made by the Supreme Court of Pakistan up to May 2025.This dataset includes the judgments made by
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is data to accompany an article by Eric C. Nystrom and David S. Tanenhaus, "'Our Most Sacred Legal Commitments:' A Digital Exploration of the U.S. Supreme Court Defining Who We Are and How They Should Opine," University of Cincinnati Law Review 89, no. 4 (May 2021).
Data Creation
This data was generated using the "cap-tools" suite of programs (written by Eric C. Nystrom and available at https://github.com/ericnystrom/captools). The current data version (05202020, 07152020) included with this repository was generated by running "cap-tools" against:
Caselaw Access Project (CAP), United States jurisdiction, rev. 20200303
CAP New York jurisdiction rev. 20200302
Harold J. Spaeth, Lee Epstein, Andrew D. Martin, Jeffrey A. Segal, Theodore J. Ruger, and Sara C. Benesh. 2019 Supreme Court Database, Version 2019 Release 01. URL: http://Supremecourtdatabase.org
Harold J. Spaeth, Lee Epstein, Andrew D. Martin, Jeffrey A. Segal, Theodore J. Ruger, and Sara C. Benesh. 2019 Supreme Court Database, Version Legacy Release 05. URL: http://Supremecourtdatabase.org
Eric C. Nystrom and David S. Tanenhaus, (2020). Connecting U.S. Supreme Court Case Information and Opinion Authorship (SCDB) to Full Case Text Data (CAP), 1791-2011 (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4344917
Data Files
"CURRENT-our-kwic-cap-scdb" is a TSV of keyword-in-context (KWIC) results for the term "our," with a six word window on each side of the term. Basic results were then filtered to exclude any results that were not found in the SCDB-CAP map, and the SCDB ID was added. The file contains 79693 records plus a first-line header.
Fields:
1: "cap-id" -- ID of the case in CAP system. 2: "casename" -- short form of the case name 3: "cite" -- reporter citation (typically US Reports) 4: "date" -- year the case was decided 5: "courtname" -- Court name, in CAP. This should be "U.S." for US Supreme Court, but some records were misfiled within the CAP data and have something else here. These were manually checked for actually being US Supreme Court records, however. 6: "courtslug" -- CAP "slug" representing this court. Typically "us" but there are a handful of variations. 7: "numopins" -- Number of opinions in CAP in this case, with counting beginning at 1. CAP's detection routines get this right a lot, but there are definitely exceptions where the actual opinion count in the case, as measured by a human observer, would be different. 8: "opintype" -- The type of opinion, as determined by CAP. Generally right, with some allowance for errors, as mentioned in the other fields. 9: "opinnum" -- The number of the particular opinion in this case from which this match was drawn. 1-based counting. 10: "casematch" -- The sequential number of this match for the case as a whole, numbered from 1. 11: "opinmatch" -- The sequential number of this match for this opinion only, numbered from 1. 12: "before" -- the string of words prior to the matching word; in this data, six words. (lowercase) 13: "term" -- the term itself, here, it is always "our" 14: "after" -- the string of six words following the term (lowercase) 15: "scdb-id" -- the SCDB identification number of this case, matched using the CAP-SCDB match described above.
"CURRENT-our-pos-cap-scdb" -- a TSV file very similar to the KWIC results file described above, with the same header and field structure, and the same results from a case perspective. The difference is that the text in fields 12, 13, and 14 was tagged with parts of speech (POS) using the Perl Lingua::EN::Tagger library, v0.28, by Aaron Coburn. The window was lengthened to seven words on each side of "our" and then tags were applied, but since the tagger also tags punctuation separately in many cases, sometimes more than seven term/TAG "words" exist in fields 12 and 14. A complete list of the tags supported by the tagger and their grammatical meanings can be found at: https://metacpan.org/source/ACOBURN/Lingua-EN-Tagger-0.30/README
"RESULTS-our-kwic-followers-opinauth-chief_071520.tsv" further extends the results contained in the files above, by isolating the noun phrase following "our" using the grammatical tags above. These noun phrases were individually categorized by our legal historian as constitutive of "culture" or "process" (or falling into an ambiguous category). (See Tanenhaus and Nystrom, listed above.) The data was further augmented by applying the opinion author's name and SCDB author ID number from the corrected opinion authorship information, available separately as Nystrom and Tanenhaus (cite above). The Chief Justice information was also added, from SCDB.
"our-casecount-by-year_normalized.tsv" -- a TSV file containing 4 columns and no header. Column 1 is the year, column 2 is the number of individual cases (not opinions) decided in that year that contained the word "our," column 3 is the total number of cases decided in that year, and column 4 is the percentage of column 3 represented by column 2 (i.e. percent of cases in a year containing "our"). Note that number of cases per year is determined from SCDB, so any minor actions such as denial of cert not included in SCDB would not be included here either.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The number of final orders made against mortgage cases disposed in the High Court. Datasets are produced on an annual year basis. The dataset is entered onto ICOS, the Integrated Courts Operations System. The data are then extracted and merged with the Central Postcode Directory, and aggregated information uploaded to this portal. Northern Ireland Courts and Tribunals Service collects information on writs and originating summonses issued in respect of mortgages in Chancery Division of the Northern Ireland High Court. This covers both Northern Ireland Housing Executive and private mortgages, and relates to both domestic and commercial properties. A mortgage case may involve more than one address or a land property. In such cases, the first postcode address entered onto ICOS is used. Not all writs and originating summonses lead to eviction. A plaintiff begins an action for an order for possession of property. The court, following a judicial hearing, may grant an order for possession. This entitles the plaintiff to apply for an order to have the defendant evicted. However, even where an order for eviction is issued the parties can still negotiate a compromise to prevent eviction. When a case is disposed of, it may have more than one final order made. This database contains the last final order made. A description of the orders is below: Possession: The court orders the defendant to deliver possession of the property to the plaintiff within a specified time. If the defendant fails to comply with the court order the plaintiff may proceed to apply to the Enforcement of Judgements Office to repossess the property and give possession of it to the plaintiff. Sale and Possession: If the plaintiff seeks possession of property which is subject to an ‘equitable mortgage’ (i.e. normally one created informally by the deposit of deeds rather than the execution of a mortgage deed) the court may order a sale of the property to enable enforcement of the equitable mortgage and that the defendant give up possession for that purpose. The sale price is subject to approval by the court. Suspended Possession: The court may postpone the date for delivery of possession if it is satisfied that the defendant is likely to be able, within a reasonable period, to pay any sums due under the mortgage, or to remedy any other breach of the obligations under the mortgage. A suspended possession order cannot be enforced by the plaintiff without the permission of the court, which will only be granted after a further hearing. Other: other orders include strike out, dismiss action, and other less common orders. Strike out: This occurs when the moving party does not wish to proceed any further, or when the court rules that there is no reasonable ground for bringing or defending the mortgage action. Dismiss action: The mortgage action is dismissed by the courts. Other orders: These include: (a) Declaration of possession coupled with an order for sale in lieu of partition and (b) Stay of Eviction - after a Possession Order is granted but prior to actual repossession, the Defendant may apply to Court to seek a stay of eviction which, if granted, prevents repossession for a certain defined period. Users of this data may have been able to self-identify themselves due to the low values in some cells. Primary and secondary disclosure control methods have been applied to this data, denoted by cells with missing data in the tables. Values of less than four, but not zero, were initially suppressed, but some of these values could have been calculated using some row and column totals and thus secondary suppression was applied to the next lowest value in the row and column. The data contain the number of final orders made against cases disposed by each Local Government District and have the following proportions of postcode coverage: 2012, 97.7%; 2013, 96.5%; 2014, 96.0%; 2015, 94.8%; 2016, 95.5%; 2017, 95.1%; 2018, 94.8%; 2019, 93.8%; 2020, 95.6%; 2021, 93.6%; 2022, 95.3%; 2023, 97.5%; 2024, 95.7%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes every reference to the 1787 United States Constitutional Convention in Supreme Court opinions through the 2019 term. Other variables such as citing justice, case name, year, and portion of the opinion quoting the Convention are included among many other variables.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Contains the Supreme Court cases for the following states from the years 2000-2020: California, Hawaii, Idaho, Maryland, Massachusetts, Oklahoma, Utah, Vermont, West Virginia, and Wyoming. The data is contained in an RData file, all of the PDF opinions have been converted to character vectors.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set (saved in Stata *.dta and .txt) contains all observations (Norwegian supreme court cases 2008-2018 decided in five-justice panels) and variables (independent variables measuring complexity of cases and the dependent variable measuring time in hours scheduled for oral arguments) relevant for a complete replication of the the study. ABSTRACT OF STUDY: While high courts with fixed time for oral arguments deprive researchers of the opportunity to extract temporal variance, courts that apply the “accordion model” institutional design and adjust the time for oral arguments according to the perceived complexity of a case are a boon for research that seeks to validate case complexity well ahead of the courts’ opinion writing. We analyse an original data set of all 1,402 merits decisions of the Norwegian Supreme Court from 2008 to 2018 where the justices set time for oral arguments to accommodate the anticipated difficulty of the case. Our validation model empirically tests whether and how attributes of a case associated with ex ante complexity are linked with time allocated for oral arguments. Cases that deal with international law and civil law, have several legal players, are cross-appeals from lower courts are indicative of greater case complexity. We argue that these results speak powerfully to the use of case attributes and/or the time reserved for oral arguments as ex ante measures of case complexity. To enhance the external validity of our findings, future studies should examine whether these results are confirmed in high courts with similar institutional design for oral arguments. Subsequent analyses should also test the degree to which complex cases and/or time for oral arguments have predictive validity on more divergent opinions among the justices and on the time courts and justices need to render a final opinion.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Artificial intelligence is being utilized in many domains as of late, and the legal system is no exception. However, as it stands now, the number of well-annotated datasets pertaining to legal documents from the Supreme Court of the United States (SCOTUS) is very limited for public use. Even though the Supreme Court rulings are public domain knowledge, trying to do meaningful work with them becomes a much greater task due to the need to manually gather and process that data from scratch each time. Hence, our goal is to create a high-quality dataset of SCOTUS court cases so that they may be readily used in natural language processing (NLP) research and other data-driven applications. Additionally, recent advances in NLP provide us with the tools to build predictive models that can be used to reveal patterns that influence court decisions. By using advanced NLP algorithms to analyze previous court cases, the trained models are able to predict and classify a court's judgment given the case's facts from the plaintiff and the defendant in textual format; in other words, the model is emulating a human jury by generating a final verdict
The dataset contains 3304 cases from the Supreme Court of the United States from 1955 to 2021. Each case has the case's identifiers as well as the facts of the case and the decision outcome. Other related datasets rarely included the facts of the case which could prove to be helpful in natural language processing applications. One potential use case of this dataset is determining the outcome of a case using its facts.
Target Variable: First Party Winner, if true means that the first party won, and if false it means that the second party won. Use NLP techniques to build features out of facts column.
research team's jupyter notebook: click here
Mohammad Alali, Shaayan Syed, Mohammed Alsayed, Smit Patel, Hemanth Bodala