sequence assembling
DNA BASER-The sequence assembler-Home pageFeatures and performancesScreen shotsPricesInfo and news.Download a full working versionContact us
molecular biology software
scf trace assembly

FAQ about confidence scores (quality values)

DNA Sequence Assembler v3

sequence assembly software

 

 

 

 

What is the confidence score (confidence score)?


Definition: The confidence score (also called bases trust information or confidence value or quality value) is a number assigned to each base in chromatogram that shows how much that base is trusted. A small confidence score means that the predicted base can’t be trusted too much (the base caller may be wrong). The confidence score can have any value from 0 to 100. A high confidence score means that the base can be trusted. By default DNA Baser considers a base untrusted if its confidence score is under 25-30.

 

 

How important is the confidence score for DNA Sequence Assembler?

 

DNA Sequence Assembler is the only software on the market that automates the sequence assembly process reducing the required time with about 1000%.

In order to generate high quality/trustable contigs WITHOUT any human intervention, DNA Sequence Assembly relies on the confidence score (confidence score) information assigned to each base as follows:

 

1) Usually bases of low quality are gathered at the ends of your samples. These clusters of low quality bases are called untrusted regions. When performing sequence assembly or other DNA sequence analysis, the user have to manually cut out (trim) those bases from samples, else the assembly may be very poor (too many ambiguities) or even wrong. This is a time consuming process!

DNA Sequence Assembler can automatically trim the untrusted regions before it assembles the sequences.

2) During sequence assembly, if an ambiguity is encountered, DNA Sequence Assembler will use the confidence score information to automatically correct the ambiguity for you.

 

If the confidence score information is missing, DNA Baser will not automatically trim the untrusted regions. In this case the user will have to inspect the contig an check if the suggestions made by DNA Baser Assembler are correct.

 

UPDATE

Starting with v4, DNA Sequence Assembler does not need the QV info to be already present in the file anymore!

If the QV info is missing it will compute it using its new state-of-the-art base caller.

Most of the information below is therefore obsolete.

 

 

Which are the formats that can store confidence score information?


Only SCF and ABI sample file types can store confidence score information. FASTA, SEQ and TXT files cannot store this information.

 

 

If I have ABI or SCF files, it means implicitly they contain confidence score?


Not necessarily. If your sequencing machine is set to generate SCF files, those files will probably (99.9%) contain confidence score information. If your sequencing machine is set to generate ABI chromatogram files, some of the files may not contain confidence score (confidence score) information. However, your technician can fix this easily by setting the machine to store confidence scores in your ABI chromatogram files.

 

 

What to do in case your ABI files do not have confidence score filed included?


Tell your technician or sequencing company to include confidence scores in your chromatogram files (ABI or SCF). All they chack to do is check a checkbox in the machine’s (software) interface. Anyway, DNA Sequence Assembler v4 will automatically compute the confidence scores.

 

 

Why DNA Sequence Assembler does not automatically trim the low quality regions in my chromatograms?


Usually you don't have to manually trim the ambiguous bases (the 'N's) from your sequences. DNA Baser will clean the end automatically for you BUT ONLY IF the confidence scores are included in the file. SCF files always contain confidence score. ABI files only sometimes have this information included. You can instruct your sequencing machine to always generate ABI files with confidence score included (you just need to check a checkbox in your machine's interface).

 

 

How can I find out if my samples contain information about confidence scores (confidence score)?


With DNA Baser, it is very easy to get this information: double click an ABI/SCF file to open it in DNA Baser. If you see green columns above each base in chromatogram, then your file has the confidence score field included.

 


Fig 1. ABI file with confidence score included

 


Fig 2. ABI file without confidence score included (green bars indicating the quality of each base are missing)

 

 

 

Another way to check the existence of confidence score is to see the properties of a file. Press Control+Enter to see the properties dialog:

 

  

 

 


If my samples do not contain confidence scores, does it mean that I can’t make a contig?


In most cases, DNA Baser will be able to generate contigs without problems even if your samples does not contain confidence score field. However, in some very rare cases (especially when your samples are extremely poor), DNA Baser may not succeed in creating contig. In this case, you must manually trim the untrusted ends of your samples/chromatograms.

 

 

Can I mix ABI and SCF files?


Sure. You can assemble ABI, SCF, FASTA, SEQ files together.

 

 

Can I mix samples containing confidence score with samples that do not contain confidence score (like SCF and FASTA)?


Yes. However, DNA Baser will create the contig in non-confidence score mode. It may be better to totally remove that file from contig than to use it. See the table below:

 

How many files contain confidence score info?
Contig will be done in:
All
confidence score mode
None
non-confidence score mode
Some
non-confidence score mode

 


In rare situations, DNA Baser generates poor contigs. Why?


Because your files contain no confidence score information. You may need to trim the untrusted regions manually.

 

 


 

Related topics


 

DNA chromatogram assembly
contig assembly software
Copyright ) BioSoft