Here you can instruct DNA Baser how to assemble your samples and how to clean their untrusted regions. If the quality of your samples is average or good, the default assembly parameters will give good results so usually you don't have to change these parameters.
We have try to make our product as easy to use as possible. In most cases you won't have to manually tweak the assembly parameters. Just use the "Optimize for samples with" function below and DNA Baser will choose the appropriate parameters the Trimming Engine and Assembler Engine.
If your samples are of low quality DNA Baser may clean them too much and the overlapping region between two sequences will be lower than 25 bases (this is the default value for 'Minimum overlap' parameter). Also for low quality samples, the sequencing errors may result in low identity between the sequences. In both cases the sequences will also not assemble correctly (you will see lots of mismatches) or they will not assemble at all. When this happens you will need to manually adjust the Trimming Engine and Assembler Engine parameters. Relax these parameters and try again.
The parameters of the ASSEMBLER ENGINE are word size, identity percent and minimum overlap (see figure above). These settings are very important and changing them will greatly affect the accuracy of the assembly process.
IDENTITY PERCENT - for two sequences to form a contig, they need to have an overlapping region. The IDENTITY PERCENT represents the minimum percentage of identity that this region can to have.
DNA Sequence Assembler automatically removes the untrusted regions from sample files whenever it imports the samples from disk. Here you can see a Video tutorial showing how the automatic trimming of the chromatogram untrusted regions works.
Setting the parameters of the TRIMMING ENGINE
There are three parameters that will determine how much of the ends of the chromatograms will be recognized as untrusted regions (see figure below). The first is the percentage of of good bases, the second is the window size and third is the threshold confidence score.
The threshold confidence score establishes the value for which the bases are considered as being correctly recognized by the base caller. The bad end recognition algorithm is moving along the sequence in size defined units of bases, called windows. The size of these units can be changed by the user, as indicated in the figure above. When the percentage of good bases in such a window is lower than the established threshold, the window is marked as untrusted. The window will be moved along the sequence until the percentage of the good bases will be at least equal with the established threshold. When these happens, the software will check the first bases in the window, and it will mark them as untrusted until a first good base is found.
DNA Sequence Assembler creates an imaginary window (see the light-blue rectangle in the picture below) and it will place this window at the beginning of the sequence. This window will be 18 bases wide. If 75% of the bases inside this window are good, the DNA Baser has found the first high quality region in your sample and it will stop the trimming process. If the above condition was not met, it will move the window one base to the left and it will repeat the process until the condition is met.
DNA Baser Assembler has an internal algorithm that allows it to automatically make decisions based on the confidence score (confidence scores) of the peaks in your chromatogram files. If the confidence score information is missing, DNA Baser will consider all peaks as having maximum quality (100). The confidence score information it is important for the trimming engine and error correction. If your sequencing machine is able to produce both SCF and ABI files then WE STRONGLY recommend you to use the SCF format instead of ABI.
More assembly parameters
|Copyright © Heracle BioSoft SRL 2020||