jablonkagroup/secs-data-incorrect-assignment-spectra dataset hosted on Hugging Face and contributed by the HF Datasets community
In 2023, more than***** of Polish respondents had no opinion on whether ChatGPT would store wrong information in the algorithm's database.
These are data and programs used to create the tables and figures for "Remotely Incorrect?...". It's probably wise to start with the Readme file.
Investigating the utility of incorrect worked examples for improving children's number line estimations.
Ancestral state reconstruction of discrete character traits is often vital when attempting to understand the origins and homology of traits in living species. The addition of fossils has been shown to alter our understanding of trait evolution in extant taxa, but researchers may avoid using fossils alongside extant species if only few are known, or if the designation of the trait of interest is uncertain. Here, I investigate the impacts of fossils and incorrectly coded fossils in the ancestral state reconstruction of discrete morphological characters under a likelihood model. Under simulated phylogenies and data, likelihood-based models are generally accurate when estimating ancestral node values. Analyses with combined fossil and extant data always outperform analyses with extant species alone, even when around one quarter of the fossil information is incorrect. These results are especially pronounced when model assumptions are violated, such as when there is a trend away from the root value. Fossil data are of particular importance when attempting to estimate the root node character state. Attempts should be made to include fossils in analysis of discrete traits under likelihood, even if there is uncertainty in the fossil trait data.
Displays all invalid point of contact emails in the Data Asset Repository. Emails are considered invalid if they cannot be validated by the trusted identity exchange (TIE). All profile and role information for an invalid email is provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe cancer registry system is an important part of the cancer control program. Improper coding of cancers leads to misclassification and incorrect statistical information about cancer. Therefore, in this study, the main objective of the qualitative analysis was the accuracy in assigning the codes to the pathological reports in the centers responsible for cancer registry.MethodsThis study was descriptive, retrospective and applied. The data source in this study included 15,659 pathology reports received during the years 2017–2019 in the population-based cancer registry centers of Mazandaran province. Out of 1800 reports, 1765 samples of reports were selected and analysis was done on them by stratified random sampling method. A researcher-made checklist was used to collect data, and the Kappa agreement coefficient and Cohen’s agreement percentage were presented to check the accuracy of the reports. STATA13 was used for data analysis.Results1150 of 1765 pathology reports (65.0%), did not have a topographic, morphological and behavioral codes and 410 (23.2%) had grade codes. The Kappa coefficient in reports with a topography code was 0.916 and with a morphology code it was 0.929, respectively. In behavior coding, the highest agreement is in the category of benign cancers at 65.2% and in grade coding in the category without grade is 100%.ConclusionThe most reports were on carcinoma morphology, and the Kappa coefficient in morphology codes has almost complete reliability. In terms of behavior coding, there was the most agreement in the category of benign cancers. The Kappa coefficient in given behavior codes has low reliability.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Review of Economics and Statistics: Forthcoming
https://www.icpsr.umich.edu/web/ICPSR/studies/1331/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/1331/terms
Drawing on an article by Singh (1993), many discussions of the evolutionary psychology of heterosexual male preferences have reported a remarkable consistency in the waist-to-hip ratios of Playboy centerfold models and Miss America pageant winners over time. We re-examine the measurement data on these American beauty icons and show that these reports are false in several ways. First, the variation in waist-to-hip ratios among these women is greater than reported. Second, the center of the distribution of waist-to-hip ratios is not 0.70, but less than this. Third, the average waist-to-hip ratio within both samples has changed over time in a manner that is statistically significant and can be regarded as mutually consistent. Taken together, the findings undermine some of the evidence given for the repeated suggestion that there is something special--evolutionarily hard-wired or otherwise--about a specific female waist-to-hip ratio of 0.70 as a preference of American heterosexual males.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A flurry of current interest in time series has focused on clarifying equation balance, fractional integration, and cointegration testing. Despite this, a number of recent suggestions may continue to lead scholars towards incorrect inferences. In this comment, I investigate the likelihood of drawing both correct and incorrect inferences under a variety of stationary and non-stationary data-generating processes. I extend previous work in this area by focusing on both short- and long-run effects using several popular model specifications. Given these findings, I conclude by offering a variety of recommendations to practitioners about how they can best specify their model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset released with the paper titled: "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board".
The dataset is a single Newline delimited JSON file. Each line in the file consists of a JSON object which is a full 4chan /pol/ thread. The JSON objects contain all the key/values returned by the 4chan API, along with three additional keys (entities, perspectives, and extracted_poster_id).
For each JSON object we complement the data with the list of the named entities we detect for each post, using the spaCy Python library. In addition, for each post we add scores returned by the Google’s Perspective API, and more specifically seven scores in the [0; 1] interval.
For the detailed description of every key in the JSON structure, along with the type of the value, please read the readme.pdf file provided with this dataset.
If you find our dataset useful, please cite our paper:
@article{papasavva2020raiders, title={Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board}, author={Antonis Papasavva, Savvas Zannettou, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn}, journal={14th International AAAI Conference On Web And Social Media (ICWSM), 2020}, year={2020} }
How to extract the data:
Note that the data is compressed. See the instructions below on how to extract the data:
Step 1: Open a terminal window and navigate to the path where the file pol_0616-1119_labeled.tar.zst is located.
Step2: Run the following command:
unzstd pol_0616-1119_labeled.tar.zst
The above command will result in a file named pol_0616-1119_labeled.tar. (in the same directory)
Step 3: Again, from your terminal window, run this command:
tar -xvf pol_0616-1119_labeled.tar
When the above command finishes, you will get (in the same directory) the extracted data - a file named pol_062016-112019_labeled.ndjson.
There are many applications that can be used to extract this data on Windows available online. The authors cannot recommend specific applications. Note that the file is compressed twice so you will need to perform the data extraction twice - once on the downloaded file, and once on the file that was extracted from the downloaded file.
Please do not hesitate to contact the author of this study in case you face any problem at: antonis.papasavva@ucl.ac.uk
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the second half of the 2010s, the /pol/ (‘politically incorrect’) forum on the 4chan image board has emerged as a space within which various extreme political ideologies are discussed and cultivated, occasionally informing off-site acts of political extremism. While previous research has often studied this space as a unified whole, it is relevant to more specifically demarcate different publics within 4chan’s /pol/ board, apart from studying it as an ‘amorphous blob’. This paper focuses specifically on ‘generals’ - recurring threads with a specific thematic focus identified by a particular vernacular phrase or tag. By identifying them it is possible to subset the board’s archive into multiple distinct datasets comprising discussions about a particular topic, such as Donald Trump, the Syria war, or British politics. We provide a dataset containing 58,841 opening posts and 13,697,738 replies to those, divided over 329 thematically distinct ‘general thread’ collections. In this paper we outline our data collection and query protocol, the structure of the data and its rationale, as well as a number of suggested research uses for this new data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Released under formal Government Information Public Access (GIPA) Application to Department of Finance, Services and Innovation (DFSI) - FA#15 15-16
"1. Please state the total number of fines issued for travelling in a Transit (T2) Lane and a Transit (T3) Lane without the required number of passengers in the
a.) 2014-15 financial year, b.) the 2013-14 financial year and c.) the 2012-13 financial year.
Please state the location of each of the fines issued in 1 for each of the 3 years. If only suburb or road name is available, please state.
Please state the total number of fines issued for travelling in a Buses only lane in the a.) 2014¬15 financial year, b.) the 2013-14 financial year and c.) the 2012-13 financial year.
Please state the location of each of the fines issued in 3 for each of the 3 years. If only suburb or road name is available, please state.
Please state the total number of fines issued for travelling in T-way lane in the a.) 2014-15 financial year, b.) the 2013-14 financial year and c.) the 2012-13 financial year.
Please state the location of each of the fines issued in 3 for each of the 3 years. If only suburb or road name is available, please state.
Please state the total number of fines issued for travelling in Bicycle lane in the a.) 2014-15 financial year, b.) the 2013-14 financial year and c.) the 2012-13 financial year.
Please state the total number of fines issued for travelling in Light Rail lane in the a.) 2014-15 financial year, b.) the 2013-14 financial year and c.) the 2012-13 financial year. "
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Some argue that support for the social safety net in the United States is influenced by beliefs about the beneficiaries’ race. Information treatments have the potential to change these beliefs, but for them to be policy relevant, their effects must last beyond the intervention. Our findings from two parallel experiments that exploit the different racialized histories of welfare and unemployment insurance indicate that racial beliefs do predict stated support for the racially stigmatized welfare program but not for the less stigmatized unemployment program. We also find these beliefs are stable if uncorrected and that they can be persistently corrected.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental data from: "The impact of wrong social information on collective wisdom in human groups",with authors:Bertrand Jayles, Ramón Escobedo, Stéphane Cezera, Adrien Blanchet, Tatsuya Kameda, Clément Sire and Guy Theraulaz.AbstractA major problem that resulted from the massive use of social media networks is the diffusion of incorrect information. However, very few studies have investigated the impact of incorrect information on individual and collective decisions. We performed experiments in which participants had to estimate a series of quantities twice, before and after receiving some social information. Unbeknownst to the participants, we controlled the degree of inaccuracy of the social information through “virtual influencers” that provided some incorrect information. We find that incorrect social information does not necessarily impair the collective wisdom of groups and can even help a group perform better when it overestimates the true value, by partly compensating a human underestimation bias. Moreover, we find that a large proportion of individuals resist social information by only partially following it, thus improving individual and collective accuracy over a large range of incorrect information.There is only one datafile:wrong-social-info.csvStructure of the file:(soon)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Norway Imports from Brazil of Invalid Carriages was US$2.86 Thousand during 2008, according to the United Nations COMTRADE database on international trade. Norway Imports from Brazil of Invalid Carriages - data, historical chart and statistics - was last updated on July of 2025.
This dataset has Air Temperature data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.This dataset has Air Temperature data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.This dataset has Air Temperature data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.This dataset has Air Temperature data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.
This dataset has Wind data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.This dataset has Wind data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.This dataset has Wind data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.This dataset has Wind data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These preliminary data have not been subjected to the National Ocean Services (NOS) Quality Control procedures, and do not necessarily meet the criteria and standards of official NOS data. They are released for limited public use with appropriate caution. WARNING: * Queries for data MUST include stationID= and time>=. * Queries USUALLY include time<= (the default end time corresponds to 'now'). * Queries MUST be for less than 30 days worth of data. * The data source isn't completely reliable. If your request returns no data when you think it should: * Try revising the request (e.g., a different time range). * The list of stations offering this data may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.
INCORRECT_DISTSCHOOL_ADDRESSES
jablonkagroup/secs-data-incorrect-assignment-spectra dataset hosted on Hugging Face and contributed by the HF Datasets community