Mining tool and large-scale datasets of single statement bug fixes in Python
Single statement bugs (and bug fixes) play a major role in the evaluation and design of automatic bug finders and program repair. With the recent advances in data-driven bug detection and repair, single statement bug fixes at the scale of millionth examples become more important than ever. For this reason, we are releasing three new datasets consisting of single statement changes and bug fixes from over 500K Python Git projects.
We came to notice that our datasets contain a significant number of duplicate patches that were missed by our deduplication procedure. To mitigate this, we are releasing cleaned versions of TSSB-3M and SSB-9M:
CTSSB-1M A cleaned version of TSSB-3M containing nearly a million isolated single statement bug fixes.
CSSB-2.6M A cleaned version of SSB-9M containing over 2.6 million single statement bug fixes.
The cleaned datasets are also available on Zenodo.
To download our datasets, use:
TSSB-3M: A dataset of over 3 million isolated single statement bug fixes. Each bug fix is related to a commit in a public Python that does not change more than a single statement.
SSB-9M: A dataset of over 9 million single statement bug fixes. Each fix modifies at least a single statement to fix a bug. However, the related code changes might incorporate changes to other files.
SSC-28M: A dataset of over 28 million general single statement changes. We are releasing this dataset with the intention to faciliate research in software evoluation. Therefore, a code change might not necessarily relate to a bug fix.
All datasets are also available at Zenodo.
Datasets of single statement bugs such as ManySStuBs4J in Java or PySStuBs in Python have helped us a lot in our research. However, their size limited the upscaling of experiments and data analyses. Therefore, we are excited to release three new datasets several magnitudes larger than any existing bug collections. Here are our main takeaways for our datasets:
Dataset statistics: SSB-9M contains more than 50x more SStuBs than PySStuBs. With focus on isolated bug fixes, TSSB-3M still contains more than 20x more SStuBs than PySStuBs. This gives us access to not only a larger quantity of simple bugs but also to a higher variety.
NonSStuBs: Most single statement bugs can be repaired with the same operations as needed for repairing SStuBs. It is likely that effective detection and repair methods for SStuBs will be effective for general single statement bugs.
Bug-fix complexity: Single statement bugs typically require simple fixes. A bug is typically fixed by 4 to 5 AST modifications.
Typos: Typos are more common than expected for single statement bugs. At least 20% of single statement bugs are likely caused by a typo.
In the following, we provide a closer look at the dataset statsistics of TSSB-3M and SSB-9M.
Pattern Name | TSSB-3M | SSB-9M |
---|---|---|
Change Idenfier Used | 237K | 659K |
Change Binary Operand | 174K | 349K |
Same Function More Args | 150K | 457K |
Wrong Function Name | 134K | 397K |
Add Function Around Expression | 117K | 244K |
Change Attribute Used | 104K | 285K |
Change Numeric Literal | 97K | 275K |
More Specific If | 68K | 121K |
Add Method Call | 60K | 118K |
Add Elements To Iterable | 57K | 175K |
Same Function Less Args | 50K | 169K |
Change Boolean Literal | 37K | 82K |
Add Attribute Access | 32K | 74K |
Change Binary Opertor | 29K | 71K |
Same Function Wrong Caller | 25K | 46K |
Less Specific If | 22K | 45K |
Change Keyword Argument Used | 20K | 59K |
Change Unary Operator | 15K | 23K |
Same Function Swap Args | 8K | 77K |
Change Constant Type | 6K | 12K |
Some examples for Python bug fixes that are classified as SStuBs will be coming soon in our repository. Until then, ManySStuBs4J provides a nice overview of examples for Java.
Only around 40% of all single statement bugs in our datasets can be classified by a SStuB pattern (in one of the categories of the previous section). For this reason, we analysed the remaining single statement bugs in the TSSB-3M dataset.
We found that NonSStuBs (i.e. bugs that do not classify as a SStuB) are actually quite similar to SStuBs. In the previous image, we compared the edit operations needed to fix a SStuB with the operations needed to fix a NonSStuB. We observed that most NonSStuB employ the same or similar operation types to fix a bug. Still, there exists some infrequent bugs (SStuB-unrelated) that are not covered by any SStuB category.
Most existing methods in automatic repair focus on bugs that can be fixed within a few edit operations. Therefore, we analyzed how many edit operations are needed in our TSSB-3M dataset.
The figure shows the distribution of the length of individual bug fixes. We find that most bugs can be fixed within a few edit operations (4-5 operations). However, there still exists bugs that require a much higher number of fix operations.
Humans commonly mistype during writting text. Since code is also written text, we expect that typos occur often in code and hence are also common for single statement bug fixes.
In the shown image, we count how often a bug is fixed by inserting, removing or transposing up to two characters. Unsurprisingly, we found that typos occur often both in SSB-9M and TSSB-3M (atleast 20% of all bug fixes). In addition, they have a frequency for fixes that address identifiers or strings.