The Federal Criminal Case Processing Statistics (FCCPS) data tool is an interface that can be used to analyze federal case processing data. Users can generate various statistics in the areas of federal law enforcement, prosecution/courts, and incarceration from 1998. Users can also look up data based on title and section of the U.S. Criminal Code from 1994. This data tool includes offenders held for violating federal laws. It excludes commitments from the D.C. Superior Court.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
https://data.gov.tw/licensehttps://data.gov.tw/license
Provide the latest monthly information on pre-concluded cases, newly received cases, pending cases, and review results of appeals cases regarding trademark law (including trademark registration, opposition, invalidation, determination, etc.), patent law (including patent application, opposition, disclosure, etc.), and other economic regulations (including international trade, business registration, commodity inspection, mining law, water law, company law, electronic gaming industry, etc.).
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Probability (data) structure for the matched case-control data with respect to belonging to window z (in) or not (out).
Investigator(s): National Center for State Courts, Court Statistics and Information Management Project This data collection provides comparable measures of state appellate and trial court caseloads by type of case for the 50 states, the District of Columbia, and Puerto Rico. Court caseloads are tabulated according to generic reporting categories developed by the Court Statistics and Technology Committee of the Conference of State Court Administrators. These categories describe differences in the unit of count and the point of count when compiling each court's caseload. Major areas of investigation include: (1) case filings in state appellate and trial courts, (2) case dispositions in state appellate and trial courts, and (3) appellate opinions. Within each of these areas of investigation, cases are separated by main case type. Types include civil cases, capital punishment cases, other criminal cases, juvenile cases, administrative agency appeals, and several other types. Years Produced: Updated annually
These data contain surface meteorology measurements of the Integrated Surface Flux Facility (ISFF) during the CASES99 field project, in October, 1999. Note: ISFF is now know as ISFS. The 5 minute files contain means, variances and covariances of ISFF variables that were sampled on the main tower and the six outlying towers. We recommend combining these to obtain more statistically-significant averages over longer time periods. These data have not been quality-controlled. Sonic winds have been rotated to geographic coordinates and tilt corrected. The data are stored in NetCDF files. Information on the NetCDF file format and software is available at http://www.unidata.ucar.edu/software/netcdf/. Information specfic to ISFS NetCDF files is available at https://www.eol.ucar.edu/content/isfs-netcdf-files. The NetCDF file names are cases.99MMDD.nc, where MMDD is the starting month and day in UTC.
https://data.gov.tw/licensehttps://data.gov.tw/license
Statistical data on land and building expropriation cases in various townships and cities in Taitung County
https://data.gov.tw/licensehttps://data.gov.tw/license
County/city, township, date (subgroup indicators such as confirmed cases, gender, age, bacteriology positivity), usage instructions: If interfacing with the machine daily, it is recommended to select the single-day dataset. If selecting the annual cumulative dataset, there are approximately 100,000 to 300,000 records, the data volume is relatively large, and it is recommended to confirm the demand before downloading. Tuberculosis is a chronic infectious disease, and the treatment for individual cases may last 6-8 months or longer. Therefore, the "under management" cases in this dataset refer to cases still under tracking and treatment, regardless of the year of illness. Updated every morning, the previous day's township indicators are summarized. The daily dataset contains up to 369 records, while the annual cumulative dataset contains approximately 100,000 to 300,000 records.
In 2023, there were 259 cases of data compromise in the manufacturing and utilities industry in the United States. This is a significant increase since 2020 when the number of personal data violation incidents in the sector was only 70. The cases registered in 2022 impacted 23.9 million people.
The data contain records of defendants in criminal cases filed in United States District Court during fiscal year 2011. The data were constructed from the Administrative Office of the United States District Courts' (AOUSC) criminal file. Defendants in criminal cases may be either individuals or corporations. There is one record for each defendant in each case filed. Included in the records are data from court proceedings and offense codes for up to five offenses charged at the time the case was filed. (The most serious charge at termination may differ from the most serious charge at case filing, due to plea bargaining or action of the judge or jury.) In a case with multiple charges against the defendant, a "most serious" offense charge is determined by a hierarchy of offenses based on statutory maximum penalties associated with the charges. The data file contains variables from the original AOUSC files as well as additional analysis variables, or "SAF" variables, that denote subsets of the data. These SAF variables are related to statistics reported in the Compendium of Federal Justice Statistics, Tables 4.1-4.5 and 5.1-5.6. Variables containing information (e.g., name, Social Security number) were replaced with blanks, and the day portions of date fields were also sanitized in order to protect the identities of individuals. These data are part of a series designed by the Urban Institute (Washington, DC) and the Bureau of Justice Statistics. Data and documentation were prepared by the Urban Institute.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pakistan PK: Tuberculosis Case Detection Rate: All Forms data was reported at 69.000 % in 2016. This records an increase from the previous number of 63.000 % for 2015. Pakistan PK: Tuberculosis Case Detection Rate: All Forms data is updated yearly, averaging 54.000 % from Dec 2000 (Median) to 2016, with 17 observations. The data reached an all-time high of 69.000 % in 2016 and a record low of 2.900 % in 2000. Pakistan PK: Tuberculosis Case Detection Rate: All Forms data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Pakistan – Table PK.World Bank: Health Statistics. Tuberculosis case detection rate (all forms) is the number of new and relapse tuberculosis cases notified to WHO in a given year, divided by WHO's estimate of the number of incident tuberculosis cases for the same year, expressed as a percentage. Estimates for all years are recalculated as new information becomes available and techniques are refined, so they may differ from those published previously.; ; World Health Organization, Global Tuberculosis Report.; Weighted average;
The data contain records of defendants in federal criminal cases terminated in United States District Court during fiscal year 2015. The data were constructed from the Executive Office for United States Attorneys (EOUSA) Central System file. According to the EOUSA, the United States attorneys conduct approximately 95 percent of the prosecutions handled by the Department of Justice. The Central Charge and Central System data contain variables from the original EOUSA files as well as additional analysis variables. Variables containing identifying information (e.g., name, Social Security Number) were either removed, coarsened, or blanked in order to protect the identities of individuals. These data are part of a series designed by Abt and the Bureau of Justice Statistics. Data and documentation were prepared by Abt.
The data contain records of defendants in criminal cases terminated in United States District Court during fiscal year 2004. The data were constructed from the Administrative Office of the United States District Courts' (AOUSC) criminal file. Defendants in criminal cases may be either individuals or corporations. There is one record for each defendant in each case filed. Included in the records are data from court proceedings and offense codes for up to five offenses charged at the time the case was filed. (The most serious charge at termination may differ from the most serious charge at case filing, due to plea bargaining or action of the judge or jury.) In a case with multiple charges against the defendant, a "most serious" offense charge is determined by a hierarchy of offenses based on statutory maximum penalties associated with the charges. The data file contains variables from the original AOUSC files as well as additional analysis variables, or "SAF" variables, that denote subsets of the data. These SAF variables are related to statistics reported in the Compendium of Federal Justice Statistics, Tables 4.1-4.5 and 5.1-5.6. Variables containing identifying information (e.g., name, Social Security number) were replaced with blanks, and the day portions of date fields were also sanitized in order to protect the identities of individuals. These data are part of a series designed by the Urban Institute (Washington, DC) and the Bureau of Justice Statistics. Data and documentation were prepared by the Urban Institute.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Laos LA: Tuberculosis Case Detection Rate: All Forms data was reported at 42.000 % in 2016. This records an increase from the previous number of 37.000 % for 2015. Laos LA: Tuberculosis Case Detection Rate: All Forms data is updated yearly, averaging 28.000 % from Dec 2000 (Median) to 2016, with 17 observations. The data reached an all-time high of 42.000 % in 2016 and a record low of 13.000 % in 2000. Laos LA: Tuberculosis Case Detection Rate: All Forms data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Laos – Table LA.World Bank: Health Statistics. Tuberculosis case detection rate (all forms) is the number of new and relapse tuberculosis cases notified to WHO in a given year, divided by WHO's estimate of the number of incident tuberculosis cases for the same year, expressed as a percentage. Estimates for all years are recalculated as new information becomes available and techniques are refined, so they may differ from those published previously.; ; World Health Organization, Global Tuberculosis Report.; Weighted average;
A feature layer view used in the Coronavirus Case Dashboard and Community Impact Dashboard to view all case information.
This study provides incident-based, case processing, and criminal history data on defendants charged in state courts during May 2002. The State Court Processing Statistics Program tracked the processing of about 15,000 felony defendants charged in 40 of the 75 largest counties during May 2002. The BJS study entitled Processing of Domestic Violence Cases in State Courts collected additional incident-based and case processing data on more than 5,000 felony and misdemeanor domestic violence defendants in 16 of the 40 counties.
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Competition and Consumer Commission of Singapore. For more information, visit https://data.gov.sg/datasets/d_63858927edcba51528bc4ceb517bfdce/view
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
This report presents the latest statistics on the type and volume of civil county court cases that are received and processed through the justice system of England and Wales in January to March 2025. It also includes the number of judicial review cases processed by the High Court, statistics from the Business and Property Courts and annual figures on proceedings in the Royal Courts of Justice and Judge Sitting Days.
A Sankey data visualisation tool showing county court case progression and a Judicial Reviews data tool have been published alongside the current publication and are updated quarterly. A link to the Sankey tool can be found in the “Sankey Case Progression Tool Guide” and the judicial reviews tool can be found at the “Judicial Review Data Visualisation Tool” link.
A Civil data visualisation tool has been included in the publication to give a more interactive and granular view of the data on civil claims in county courts. A link to the tool can be found in the “Civil Data Visualisation Tool” page.
The Federal Criminal Case Processing Statistics (FCCPS) data tool is an interface that can be used to analyze federal case processing data. Users can generate various statistics in the areas of federal law enforcement, prosecution/courts, and incarceration from 1998. Users can also look up data based on title and section of the U.S. Criminal Code from 1994. This data tool includes offenders held for violating federal laws. It excludes commitments from the D.C. Superior Court.