DNA Sequence Assembler v4
What is the nucleotide confidence score?
Definition: The confidence score (also called bases trust information or confidence value or quality value) is a number assigned to each base in a chromatogram showing how much
that base is trusted. A small confidence score means that the predicted base can’t be trusted too much (the base caller may be wrong). The confidence score can
have any value from 0 to 100. A high confidence score
means that the base can be trusted. By default DNA Baser considers a base untrusted if its confidence score is under 25-30.
Why confidence scores are important?
DNA Sequence Assembler is the only software on the market that FULLY automates the sequence assembly process reducing the required time
with about 1000%.
In order to generate high quality/trustable contigs WITHOUT any human intervention, DNA Sequence Assembler relies on the confidence score (confidence score) information assigned to each
base as follows:
- Usually bases of low quality are gathered at the ends of your samples. These clusters of low quality bases are called untrusted regions. When performing
sequence assembly or other DNA sequence analysis, the user has to manually cut
out (trim) those bases from samples, else the assembly may be very poor (too many ambiguities) or even wrong. This is a time consuming process!
- DNA Sequence Assembler can automatically detect and cut the untrusted regions before it assembles
the sequences. During sequence assembly, if an ambiguity is encountered, DNA Sequence Assembler will use the confidence score information
to automatically correct the ambiguity for you.
Starting with v4, DNA Sequence Assembler does not need the confidence score info to be already present in the chromatogram. If the confidence scores are missing DNA Baser will automatically compute them using its new state-of-the-art base caller.
Which are the file formats that can store confidence score information?
Only SCF and ABI chromatogram files can store confidence score information. FASTA, SEQ and TXT files cannot store this information.
If I have ABI or SCF files, it means implicitly they contain confidence score?
Not necessarily. If your sequencing machine is set to generate SCF files, those files will probably (99.9%) contain confidence score information. If your sequencing machine is set to generate
ABI chromatogram files, some of the files may not contain confidence score (confidence score) information. However, your technician can fix this easily by setting the
machine to store confidence scores in your ABI chromatogram files.
What to do in case your ABI files do not have confidence score filed included?
Tell your technician or sequencing company to include confidence scores in your chromatogram files (ABI or SCF). All they have to do is check a checkbox in the machine’s (software) interface. Anyway, DNA Sequence Assembler v4 will automatically compute the confidence scores so you should not care anymore about this.
Why DNA Baser does not automatically trim the low quality regions?
DNA Sequence Assembler does not automatically trim low quality regions (untrusted regions) in your chromatograms probably because the base caller is not active. This page explains how to activate it.
How can I find out if my chromatograms contain information about confidence scores?
With DNA Sequence Assembler it is easy to get this information: in DNA Baser double click any ABI/SCF chromatogram file to open it.
If you see little green bars above each base in chromatogram, then your file
has the confidence score info included.
Fig 1. ABI file with confidence score included
Fig 2. ABI file without confidence score included (green bars indicating the quality of each base are missing)
Another way to check the existence of confidence score is to see the properties of a file.
For the current selected chromatogram, press Control+Enter to see the properties dialog:
Also, you can see the info about the quality scores in the Log.
Can I assemble ABI, SCF, Fast files?
Sure. You can assemble ABI, SCF, FASTA, SEQ files together.