It’s used outside the browser, for example as an efficient and flexible language for serverless computing.
To explore whether we could leverage WebAssembly to accelerate our internet app, we hunted for an tool which calculates QC metrics. We sought a tool written in C/C++/Rust that it was amenable to porting to WebAssembly.
After some research, we chose to go with seqtk, a commonly-used, open-source tool written in C which may help us assess the quality of sequencing information (and is more commonly used to manipulate those data files).
# About the command line
Although there are scores of command line tools available to generate such quality control accounts, the goal of fastq.bio would be to provide an interactive record of information quality without leaving the browser. This is useful for scientists that are uncomfortable using the command line.
With seqtk here's the architecture:
There is one improvement we looked into. Thus far, the way fastq.bio gets the metrics of interest would be by simply calling two distinct C functions, each of which calculates another set of metrics. Especially, 1 function returns information in the kind of a histogram (i.e. a listing of values which we bin into ranges), whereas another function returns information for a function of DNA sequence position. Unfortunately, this usually means that the chunk of file is read twice, which is unnecessary.
By literally commenting printf statements that weren't needed out -- this is an excellent improvement given how easy it was to achieve.
Once the metrics are calculated for that chunk of data, we plot the results with Plotly.js, and continue on to the next chunk from the file. The explanation for calculating the file is to enhance the consumer experience: processing the entire file at once would require too much time, because FASTQ documents are in the hundreds of gigabytes. We discovered a chunk size between 0.5 MB and 1 MB will make the program more seamless and would return info to the user more quickly, but this number will be different based on the facts of your application and how thick the computations are.
The web app we'll work with is fastq.bio, an interactive web tool that provides scientists with a quick preview of the caliber of their DNA sequencing information; sequencing is the process where we browse the “letters” (i.e. nucleotides) at a DNA sample. Notice : This report delves into some advanced topics like compiling C code, but don't worry if you do not have experience with this; you will still be able to follow along and get a sense for what's possible with WebAssembly.
We will not enter the details of the calculations, but in brief, the plots above provide scientists a feeling for the sequencing went and are used to identify data quality problems at a 40, how well.