Artifical vs Real

On the use of syntactic distance metrics in evaluating fault mimicking techniques

View project on GitHub

Journal First - TSE 2022

Supplementary data

Empirical evaluation

RQ1 - Results per tool

How semantically and syntactically similar are seeded and real faults?

BLEU Coefficient - Ochiai score

Class granularity level (PiTest - CodeBERT - DeepMutation - IBIR)

PiTest CodeBERT DeepMutation IBIR

Cosine Coefficient - Ochiai score

Class granularity level (PiTest - CodeBERT - DeepMutation - IBIR)

PiTest CodeBERT DeepMutation IBIR

Function granularity level (PiTest - CodeBERT - DeepMutation - IBIR)

PiTest CodeBERT DeepMutation IBIR

Jaccard Coefficient - Ochiai score

Class granularity level (PiTest - CodeBERT - DeepMutation - IBIR)

PiTest CodeBERT DeepMutation IBIR

Function granularity level (PiTest - CodeBERT - DeepMutation - IBIR)

PiTest CodeBERT DeepMutation IBIR

Discussion

Sensitivity of mutants from the same location. Small syntactic changes lead to diverse semantic changes.

Changed lines location (PiTest - CodeBERT - IBIR)

PiTest CodeBERT IBIR

Random lines location (PiTest - CodeBERT - IBIR)

PiTest CodeBERT IBIR

Using Fault Detection Probability instead of Ochiai score for measurement

Subsumed Faults

Subsumed Faults

Subsumed Faults

Link for rest of data ℹ️ (each .tar file contains ReadMe guidelines with data format and structure)

👉 Download repository with scripts
👉 Download plots
👉 Download data
👉 Download Pit data (⚠️ heavy file)
👉 Download CodeBERT data (⚠️ heavy file)
👉 Download DeepMutation data (⚠️ heavy file)
👉 Download IBIR data (⚠️ heavy file)

Support or Contact

Check our Git-Repo for all descriptive statistics data