Facebook
TwitterSpeakers can correct their speech errors, but the mechanisms behind repairs are still unclear. Some findings, such as the speed of repairs and speakers’ occasional unawareness of them, point to an automatic repair process. This paper reports a finding that challenges a purely automatic repair process. Specifically, we show that as error rate increases, so does the proportion of repairs. Twenty highly-proficient English-Spanish bilinguals described dynamic visual events in real time (e.g. “The blue bottle disappears behind the brown curtain”) in English and Spanish blocks. Both error rates and proportion of corrected errors were higher on (a) noun phrase (NP)2 vs. NP1, and (b) word1 (adjective in English and noun in Spanish) vs. word2 within the NP. These results show a consistent relationship between error and repair probabilities, disentangled from position, compatible with a model in which greater control is recruited in error-prone situations to enhance the effectiveness of repair.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
"Robust standard errors" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, w ill still bias estimators of all but a few quantities of interest. Even though this message is well known to methodologists, it has failed to reach most applied researchers. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help applied researchers realize these gains via an alternative perspective that offers a productive way to use robust standard errors; a new general and easier-to-use "generalized information matrix test" statistic; and practical illustrations via simulations and real examples from published research. Instead of jettisoning this extremely popular tool, as some suggest, we show how robust and classical standard error differences can provide effective clues about model misspecification, likely biases, and a guide to more reliable inferences. See also: Unifying Statistical Analysis
Facebook
TwitterData cleaning is one of the most important but time-consuming tasks for data scientists. The data cleaning task consists of two major steps: (1) error detection and (2) error correction. The goal of error detection is to identify wrong data values. The goal of error correction is to fix these wrong values. Data cleaning is a challenging task due to the trade-off among correctness, completeness, and automation. In fact, detecting/correcting all data errors accurately without any user involvement is not possible for every dataset. We propose a novel data cleaning approach that detects/corrects data errors with a novel two-step task formulation. The intuition is that, by collecting a set of base error detectors/correctors that can independently mark/fix data errors, we can learn to combine them into a final set of data errors/corrections using a few informative user labels. First, each base error detector/corrector generates an initial set of potential data errors/corrections. Then, the approach ensembles the output of these base error detectors/correctors into one final set of data errors/corrections in a semi-supervised manner. In fact, the approach iteratively asks the user to annotate a tuple, i.e., marking/fixing a few data errors. The approach learns to generalize the user-provided error detection/correction examples to the rest of the dataset, accordingly. Our novel two-step formulation of the error detection/correction task has four benefits. First, the approach is configuration free and does not need any user-provided rules or parameters. In fact, the approach considers the base error detectors/correctors as black-box algorithms that are not necessarily correct or complete. Second, the approach is effective in the error detection/correction task as its first and second steps maximize recall and precision, respectively. Third, the approach also minimizes human involvement as it samples the most informative tuples of the dataset for user labeling. Fourth, the task formulation of our approach allows us to leverage previous data cleaning efforts to optimize the current data cleaning task. We design an end-to-end data cleaning pipeline according to this approach that takes a dirty dataset as input and outputs a cleaned dataset. Our pipeline leverages user feedback, a set of data cleaning algorithms, and a set of previously cleaned datasets, if available. Internally, our pipeline consists of an error detection system (named Raha), an error correction system (named Baran), and a transfer learning engine. As our extensive experiments show, our data cleaning systems are effective and efficient, and involve the user minimally. Raha and Baran significantly outperform existing data cleaning approaches in terms of effectiveness and human involvement on multiple well-known datasets.
Facebook
TwitterRecently developed low-cost Global Positioning System (GPS) data loggers are promising tools for wildlife research because of their affordability for low-budget projects and ability to simultaneously track a greater number of individuals compared with expensive built-in wildlife GPS. However, the reliability of these devices must be carefully examined because they were not developed to track wildlife. This study aimed to assess the performance and accuracy of commercially available GPS data loggers for the first time using the same methods applied to test built-in wildlife GPS. The effects of antenna position, fix interval and habitat on the fix-success rate (FSR) and location error (LE) of CatLog data loggers were investigated in stationary tests, whereas the effects of animal movements on these errors were investigated in motion tests. The units operated well and presented consistent performance and accuracy over time in stationary tests, and the FSR was good for all antenna positions...
Facebook
TwitterUpon reviewing the train data for the Sberbank Russian Housing Market competition, I noticed noise & errors. Obviously, neither of these should be present in your training set, and as such, you should remove them. This is the updated train set with all noise & errors I found removed.
Data was removed when:
full_sq-life_sq<0
full_sq-kitch_sq<0
life_sq-kitch_sq<0
floor-max_floor<0
I simply deleted the row from the dataset, and did not really use anything special other than that.
Facebook
TwitterA Lossless Syntax Tree Generator with Zero-shot Error Correction
This repository includes all of the datasets to reproduce the resuls in the paper and the srcml files that we generated. We follow Jam's procedure to compile the dataset for pretraining and finetuning.
Dataset files
Filename Description
bin.tar.gz bin files to finetune the model to fix the syntatic error
fundats.tar.gz data files to generate srcml with the error correction in the zero-shot… See the full description on the dataset page: https://huggingface.co/datasets/apcl/autorepair.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outliers correspond to fixes with location error (LE)>3 standard deviations from the mean location error of all fixes in the same habitat (i.e., without regard to the visibility category). The last two columns report on the mean number of outliers ± standard deviation across each visibility, and LERMS values calculated from all fixes in the same habitat after removal of outlier values.
Facebook
TwitterReload to correct some errors.
Facebook
TwitterResults from stationary unit tests performed with 40 low-cost CatLog GPS data loggers: the fix success rate (FSR) ± standard deviation (SD), mean time of the fix acquisition (μFAT), root mean square of the location errors (LERMS), mean location error (μLE), median location error (mLE), percentage of fixes with LE < 10 m, the mean number of outliers per unit (N outliers) and root mean square of the location errors after the removal of outliers (LERMS without outliers) for positional fixes collected from for two antenna positions, three fix intervals programs and four habitat types.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Software Engineers' Human Errors
This dataset contains 200 GitHub comments with manual human error annotations, released as part of the following publication:
Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023.
Included Files
The "developer_human_errors.csv" file contains the full dataset of 200 software defect descriptions annotated with human error types (slips, lapses, mistakes) and T.H.E.S.E. categories.
CSV Fields
ID: Unique identifier for the comment.
SOURCE: Whether this comment originates from a commit, issue, or pull request.
COMMENT_URL: The URL linking to the comment.
COMMENT_TEXT: The raw comment text.
HUMAN_ERROR_TYPE: Whether the software defect described is a slip, lapse, or mistake.
THESE_V4_ID: Manually assigned T.H.E.S.E. category with labels corresponding to Version 4 of T.H.E.S.E.
THESE_NAME: Name corresponding to manually assigned T.H.E.S.E. category.
Annotation Details
Human error types span slips, lapses, and mistakes from James Reason's Generic Error Modelling System (GEMS):
Slips: Failures of attention.
Lapses: Failures of memory.
Mistakes: Failures of planning.
T.H.E.S.E. categories are summarized below:
S01: Typos & Misspellings
S02: Syntax Errors
S03: Overlooking documented Information
S04: Multitasking Errors
S05: Hardware Interaction Errors
S06: Overlooking Proposed Code Changes
S07: Overlooking Existing Functionality
S08: General Attentional Failure
L01: Forgetting to Finish a Development Task
L02: Forgetting to Fix a Defect
L03: Forgetting to Remove Development Artifacts
L04: Working with Outdated Source Code
L05: Forgetting an Import Statement
L06: Forgetting to Save Work
L07: Forgetting Previous Development Discussion
L08: General Memory Failure
M01: Code Logic Errors
M02: Incomplete Domain Knowledge
M03: Wrong Assumption Errors
M04: Internal Communication Errors
M05: External Communication Errors
M06: Solution Choice Errors
M07: Time Management Errors
M08: Inadequate Testing
M09: Incorrect/Insufficient Configuration
M10: Code Complexity Errors
M11: Internationalization/String Encoding Errors
M12: Inadequate Experience Errors
M13: Insufficient Tooling Access Errors
M14: Workflow Order Errors
M15: General Planning Failure
Contact
Please contact Benjamin S. Meyers (email) with questions about this data and its collection.
Acknowledgments
Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
Facebook
TwitterThe dataset description is shown below:
Id : 8 digit Id of the job advertisement,
Title: Title of the advertised job position,
Location: Location of the advertised job position,
ContractType: The contract type of the advertised job position, could be full-time, part-time or non-specified,
ContractTime: The contract time of the advertised job position, could be permanent, contract or non-specified,
Company: Company (employer) of the advertised job position,
Category: The Category of the advertised job position, e.g., IT jobs, Engineering Jobs, etc.
Salary per annum: Annual Salary of the advertised job position, e.g., 80000,
OpenDate: The opening time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,
CloseDate: The closing time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,
SourceName: The website where the job position is advertised.
In this task, you are required to inspect and audit the data (dataset1_with_error.csv) to identify the data problems, and then fix the problems. Different generic and major data problems could be found in the data might include:
Lexical errors Irregularities Violations of the Integrity constraint. Inconsistency In the end, save the error-free dataset in dataset1_solution.csv. The number of records in your solution should be the same as the number of those in the input file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package of the paper "From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects"ABSTRACT:Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git).We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.You can find the full paper at: https://doi.org/10.1145/3345629.3345639If you use this dataset for your research, please reference the following paper:@inproceedings{Vieira:2019:RBC:3345629.3345639, author = {Vieira, Renan and da Silva, Ant^{o}nio and Rocha, Lincoln and Gomes, Jo~{a}o Paulo}, title = {From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects}, booktitle = {Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering}, series = {PROMISE'19}, year = {2019}, isbn = {978-1-4503-7233-6}, location = {Recife, Brazil}, pages = {80--89}, numpages = {10}, url = {http://doi.acm.org/10.1145/3345629.3345639}, doi = {10.1145/3345629.3345639}, acmid = {3345639}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Bug-Fix Dataset, Mining Software Repositories, Software Traceability}, } P.S: We added a new dataset version (v1.0.1). In this version, we fix the git commit features that track the src and test files. More info can be found in the fix-script.py file.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
You are an analyst at "Megaline," a federal mobile operator. The company offers two tariff plans to customers: "Smart" and "Ultra." To adjust the advertising budget, the commercial department wants to understand which tariff generates more revenue.
You need to conduct a preliminary analysis of the tariffs on a small sample of customers. You have data on 500 users of "Megaline": who they are, where they are from, which tariff they use, how many calls and messages they sent in 2018. You need to analyze customer behavior and conclude which tariff is better.
"Smart" Tariff: - Monthly fee: 550 rubles - Included: 500 minutes of calls, 50 messages, and 15 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 3 rubles (Megaline always rounds up minutes and megabytes. If the user talked for just 1 second, it counts as a whole minute); 2. Message: 3 rubles; 3. 1 GB of internet traffic: 200 rubles.
"Ultra" Tariff: - Monthly fee: 1950 rubles - Included: 3000 minutes of calls, 1000 messages, and 30 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 1 ruble; 2. Message: 1 ruble; 3. 1 GB of internet traffic: 150 rubles.
Note: Megaline always rounds up seconds to minutes and megabytes to gigabytes. Each call is rounded up individually: even if it lasted just 1 second, it is counted as 1 minute. For web traffic, separate sessions are not counted. Instead, the total amount for the month is rounded up. If a subscriber uses 1025 megabytes in a month, they are charged for 2 gigabytes.
Step 1: Open the file with data and study the general information
File paths:
- /datasets/calls.csv
- /datasets/internet.csv
- /datasets/messages.csv
- /datasets/tariffs.csv
- /datasets/users.csv
Step 2: Prepare the data - Convert data to the required types; - Find and fix errors in the data, if any. Explain what errors you found and how you fixed them. You will find calls with zero duration in the data. This is not an error: missed calls are indicated by zeros, so they do not need to be deleted.
For each user, calculate: - Number of calls made and minutes spent per month; - Number of messages sent per month; - Amount of internet traffic used per month; - Monthly revenue from each user (subtract the free limit from the total number of calls, messages, and internet traffic; multiply the remainder by the value from the tariff plan; add the corresponding tariff plan's subscription fee).
Step 3: Analyze the data Describe the behavior of the operator's customers based on the sample. How many minutes of calls, how many messages, and how much internet traffic do users of each tariff need per month? Calculate the average, variance, and standard deviation. Create histograms. Describe the distributions.
Step 4: Test hypotheses - The average revenue of users of the "Ultra" and "Smart" tariffs is different; - The average revenue of users from Moscow differs from the revenue of users from other regions. Moscow is written as 'Москва'. You can put it in your value, when check the hypothesis
Set the threshold alpha value yourself.
Explain: - How you formulated the null and alternative hypotheses; - Which criterion you used to test the hypotheses and why.
Step 5: Write a general conclusion
Formatting: Perform the task in Jupyter Notebook. Fill the program code in the cells of type code, and the textual explanations in the cells of type markdown. Apply formatting and headers.
Table users (user information):
- user_id: unique user identifier
- first_name: user's first name
- last_name: user's last name
- age: user's age (years)
- reg_date: date of tariff connection (day, month, year)
- churn_date: date of tariff discontinuation (if the value is missing, the tariff was still active at the time of data extraction)
- city: user's city of residence
- tariff: name of the tariff plan
Table calls (call information):
- id: unique call number
- call_date: call date
- duration: call duration in minutes
- user_id: identifier of the user who made the call
Table messages (message information):
- id: unique message number
- message_date: message date
- user_id: identifier of the user who sent the message
Table internet (internet session information):
- id: unique session number
- mb_used: amount of internet traffic used during the session (in megabytes)
- session_date: internet session date
- user_id: user identifier
Table tariffs (tariff information):
- tariff_name: tariff name
- rub_monthly_fee: monthly subscription fee in rubles
- minutes_included: number of call minutes included per month
- `messages_included...
Facebook
TwitterThis study was designed to develop crime forecasting as an application area for police in support of tactical deployment of resources. Data on crime offense reports and computer aided dispatch (CAD) drug calls and shots fired calls were collected from the Pittsburgh, Pennsylvania Bureau of Police for the years 1990 through 2001. Data on crime offense reports were collected from the Rochester, New York Police Department from January 1991 through December 2001. The Rochester CAD drug calls and shots fired calls were collected from January 1993 through May 2001. A total of 1,643,828 records (769,293 crime offense and 874,535 CAD) were collected from Pittsburgh, while 538,893 records (530,050 crime offense and 8,843 CAD) were collected from Rochester. ArcView 3.3 and GDT Dynamap 2000 Street centerline maps were used to address match the data, with some of the Pittsburgh data being cleaned to fix obvious errors and increase address match percentages. A SAS program was used to eliminate duplicate CAD calls based on time and location of the calls. For the 1990 through 1999 Pittsburgh crime offense data, the address match rate was 91 percent. The match rate for the 2000 through 2001 Pittsburgh crime offense data was 72 percent. The Pittsburgh CAD data address match rate for 1990 through 1999 was 85 percent, while for 2000 through 2001 the match rate was 100 percent because the new CAD system supplied incident coordinates. The address match rates for the Rochester crime offenses data was 96 percent, and 95 percent for the CAD data. Spatial overlay in ArcView was used to add geographic area identifiers for each data point: precinct, car beat, car beat plus, and 1990 Census tract. The crimes included for both Pittsburgh and Rochester were aggravated assault, arson, burglary, criminal mischief, misconduct, family violence, gambling, larceny, liquor law violations, motor vehicle theft, murder/manslaughter, prostitution, public drunkenness, rape, robbery, simple assaults, trespassing, vandalism, weapons, CAD drugs, and CAD shots fired.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
All cities with a population seat of adm div (ca 80.000)Sources and Contributions Sources: GeoNames is aggregating over hundred different data sources. Ambassadors: GeoNames Ambassadors help in many countries. Wiki: A wiki allows you to view the data, quickly fix errors, and add missing places. Donations and Sponsoring: Costs for running GeoNames are covered by donations and sponsoring.enrichment:add country name
Name Country Code Country Name Timezone Population Latitude Longitude Acknowledgments These data come from Maxmind.com and have not been altered. The original source can be found by clicking here
Additionally, Reference https://download.geonames.org/export/dump/ Attributions https://www.geonames.org/about.html
Facebook
TwitterALERT: As of 10/15/2025, we are working to resolve a data error in treatment completion variables (percent and detail). We expect a resolution by 10/31/2025 at which point downloading the revised data is advised. The Study Interventions dataset includes information about each of the specific treatment arms that were studied in all RCTs. Each study arm was coded to indicate the type of intervention or comparison condition. This dataset includes the study-level Study Class as well as individual variables for each category of treatment, coded as Yes or No for each arm. Study arm treatment category variables are as follows: Pharmacotherapy (as well as a subclass such as antidepressant, antianxiety, etc.); Psychotherapy (as well as a subclass to identify trauma-focused or non-trauma-focused therapy); Complementary and Integrative Health (CIH; as well as a subclass such as relaxation or meditation); Nonpharmacologic Biological; Nonpharmacologic Cognitive; Collaborative Care; Other Treatments; Control The Study Intervention dataset also includes information on the format of the treatment (individual, group, couples, mixed); treatment delivery method (in person, by phone, by video, technology alone, technology assisted, written or mixed); dose or amount of treatment and, treatment completion and adherence. Use this dataset to learn about treatment studies of a particular type Each record is an arm of the study, labeled as A, B, C, or D. Values abstracted as not applicable ("NA") or not reported ("NR") from the study are null values (empty cells).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about an online electronic store. The store has three warehouses from which goods are delivered to customers.
Use this dataset to perform graphical and/or non-graphical EDA methods to understand the data first and then find and fix the data problems. - Detect and fix errors in dirty_data.csv - Impute the missing values in missing_data.csv - Detect and remove Anolamies - To check whether a customer is happy with their last order
All the Best
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
PLEASE NOTE: We know there are errors in the data although we strive to minimize them. Examples include: • Manifests completed incorrectly by the generator or the transporter - data was entered based on the incorrect information. We can only enter the information we receive. • Data entry errors – we now have QA/QC procedures in place to prevent or catch and fix a lot of these. • Historically there are multiple records of the same generator. Each variation in spelling in name or address generated a separate handler record. We have worked to minimize these but many remain. The good news is that as long as they all have the same EPA ID they will all show up in your search results. • Handlers provide erroneous data to obtain an EPA ID - data entry was based on erroneous information. Examples include incorrect or bogus addresses and names. There are also a lot of MISSPELLED NAMES AND ADDRESSES! • Missing manifests – Not every required manifest gets submitted to the DEP. Also, of the more than 100,000 paper manifests we receive each year, some were incorrectly handled and never entered. • Missing data – we know that the records for approximately 25 boxes of manifests, mostly prior to 1985 were lost from the database in the 1980’s. • Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses. • Wastes incorrectly entered – mostly due to complex names that were difficult to spell, or typos in quantities or units of measure.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
PLEASE NOTE: Use ALL CAPS when searching using the "Filter" function on text such as: LITCHFIELD. But not needed for the upper right corner "Find in this Dataset" search where for example "Litchfield" can be used.
We know there are errors in the data although we strive to minimize them. Examples include:
• Manifests completed incorrectly by the generator or the transporter - data was entered based on the incorrect information. We can only enter the information we receive.
• Data entry errors – we now have QA/QC procedures in place to prevent or catch and fix a lot of these.
• Historically there are multiple records of the same generator. Each variation in spelling in name or address generated a separate handler record. We have worked to minimize these but many remain. The good news is that as long as they all have the same EPA ID they will all show up in your search results.
• Handlers provide erroneous data to obtain an EPA ID - data entry was based on erroneous information. Examples include incorrect or bogus addresses and names. There are also a lot of MISSPELLED NAMES AND ADDRESSES!
• Missing manifests – Not every required manifest gets submitted to the DEP. Also, of the more than 100,000 paper manifests we receive each year, some were incorrectly handled and never entered.
• Missing data – we know that the records for approximately 25 boxes of manifests, mostly prior to 1985 were lost from the database in the 1980’s.
• Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses.
• Wastes incorrectly entered – mostly due to complex names that were difficult to spell, or typos in quantities or units of measure.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This repository contains the user study data accompanying the master thesis by Maria Khakimova titled "Enhancing Proof Assistant Error Messages with Hints: A User Study". The goal of the research was to investigate the impacts of hint-based error message enhancements in Agda on novice programmers. To do this, we enhanced three error messages with hints, and conducted a user study.
In the user study, we asked participants to resolve errors in pre-written Agda code, and rate the helpfulness of the error message. We collected the following data:
This repository contains the programming questions created for the user study, with the accompanying error messages (both original and enhanced) in programming_exercises.zip. We also provide the (anonymised) collected data in JSON format in response-data.json.
For more details, please read the provided README.
Facebook
TwitterSpeakers can correct their speech errors, but the mechanisms behind repairs are still unclear. Some findings, such as the speed of repairs and speakers’ occasional unawareness of them, point to an automatic repair process. This paper reports a finding that challenges a purely automatic repair process. Specifically, we show that as error rate increases, so does the proportion of repairs. Twenty highly-proficient English-Spanish bilinguals described dynamic visual events in real time (e.g. “The blue bottle disappears behind the brown curtain”) in English and Spanish blocks. Both error rates and proportion of corrected errors were higher on (a) noun phrase (NP)2 vs. NP1, and (b) word1 (adjective in English and noun in Spanish) vs. word2 within the NP. These results show a consistent relationship between error and repair probabilities, disentangled from position, compatible with a model in which greater control is recruited in error-prone situations to enhance the effectiveness of repair.