SoftwarePhysics: An IT and Geophysical Perspective on the Nanopore Sequencing of DNA and RNA

In this post, I would like to cover a remarkable new tool for reading the sequence of bases on very long stretches of DNA and RNA in a very rapid manner which is called nanopore sequencing. Nanopore sequencing devices can directly read a stretch of DNA or RNA that is over one million bases in length at a speed of about 450 bases/second with a device about the size of a flip smartphone. These nanopore sequencing devices now start at a price of about $1,000 and can plug into the USB port of your laptop. They are also very rugged, and can even work in very harsh field conditions simply using the power from your laptop battery. Prior to nanopore sequencing, DNA and RNA were sequenced using very time-consuming and expensive biochemical procedures. These prior procedures did not directly read DNA and RNA base sequences themselves.

Figure 1 – Above is a MinION nanopore sequencer being used in the field.

Figure 2 – Above is the general setup of a MinION nanopore sequencer. The DNA or RNA sample to be read is placed into the flow cell of the unit.

The development of nanopore sequencing should be of interest to all IT professionals because it is truly an extraordinary story of information processing capability that rivals that which has occurred in IT over the past 82 years, or 2.6 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941. First, it is important to remember that whenever data is written from memory to a permanent secondary medium that persists in time, such as a thumb drive, SSD or HDD drive, or even the magnetic tape of yore, it is always done so in a sequential manner. Each byte of data that is written to a permanent secondary medium is written in a sequential manner - one byte after the other. The written information in each byte is naturally important, but even more so is the sequence of the bytes that are written out to the permanent secondary medium. For example, each character in one of your emails is encoded as a single byte of information using the characters in the ASCII encoding table. However, the true essence of the information in your email is encoded by the total order and sequence of the bytes in the email. The same is true of the total order and sequence of the bases in a stretch of DNA or RNA used to build a protein molecule. That is why being able to read the sequence of bases in a string of DNA or RNA is so important. It is even more important than being able to read the bytes in your email.

Biological and IT Data Access Methods
Before proceeding, we first need to briefly review how data is encoded in IT and by carbon-based life forms in biology. In biology, data is encoded by DNA and RNA molecules.

Figure 3 - RNA is a one-track tape, while DNA is a two-track tape. DNA has a data track and a parity track that allows for error corrections after DNA replicates. DNA uses a slightly different version of the ribose sugar and also uses the nucleotide of T (Thymine) instead of the U (Uracil) used by RNA.

For IT, we will return to the batch processing of data stored on magnetic tapes that was common in the 1960s and 1970s because it is more closely aligned with how biological information is processed using DNA and RNA tapes. One of the simplest and oldest access methods in IT is called QSAM - Queued Sequential Access Method:

Queued Sequential Access Method
http://en.wikipedia.org/wiki/Queued_Sequential_Access_Method

I did a lot of magnetic tape processing in the 1970s and early 1980s using QSAM. At the time we used 9 track tapes that were 1/2 inch wide and 2400 feet long on a reel with a 10.5 inch diameter. The tape had 8 data tracks and one parity track across the 1/2-inch tape width. That way we could store one byte across the 8 1-bit data tracks in a frame, and we used the parity track to check for errors. We used odd parity, if the 8 bits on the 8 data tracks in a frame added up to an even number of 1s, we put a 1 in the parity track to make the total number of 1s an odd number. If the 8 bits added up to an odd number of 1s, we put a 0 in the parity track to keep the total number of 1s an odd number. Originally, 9 track tapes had a density of 1600 bytes/inch of tape, with a data transfer rate of 15,000 bytes/second. Remember, a byte is 8 bits and can store one character, like the letter “A” which we encode in the ASCII code set as A = “01000001”.

Figure 4 – A 1/2 inch wide 9 track magnetic tape on a 2400 foot reel with a diameter of 10.5 inches

Figure 5 – 9 track magnetic tape had 8 data tracks and one parity track using odd parity which allowed for the detection of bad bytes with parity errors on the tape.

Later, 6250 bytes/inch tape drives became available, and I will use that density for the calculations that follow. Now suppose you had 50 million customers and the current account balance for each customer was stored on an 80-byte customer record. A record was like a row in a spreadsheet. The first field of the record was usually a CustomerID field that contained a unique customer ID like a social security number and was essentially the equivalent of a promoter region on the front end of a gene in DNA. The remainder of the 80-byte customer record contained fields for the customer’s name and billing address, along with the customer’s current account information. Between each block of data on the tape, there was a 0.5-inch gap of “junk” tape. This “junk” tape allowed for the acceleration and deceleration of the tape reel as it spun past the read/write head of a tape drive and perhaps occasionally reversed direction. Since an 80-byte record only came to 80/6250 = 0.0128 inches of tape, which is quite short compared to the overhead of the 0.5-inch gap of “junk” tape between records, it made sense to block many records together into a single block of data that could be read by the tape drive in a single I/O operation. For example, blocking 100 80-byte records increased the block size to 8000/6250 = 1.28 inches and between each 1.28-inch block of data on the tape, there was the 0.5-inch gap of “junk” tape. This greatly reduced the amount of wasted “junk” tape on a 2400-foot reel of tape. So each 100-record block of data took up a total of 1.78 inches of tape and we could get 16,180 blocks on a 2400-foot tape or the data for 1,618,000 customers per tape. The advantage of QSAM, over an earlier sequential access method known as BSAM, was that you could read and write an entire block of records at a time via an I/O buffer. In our example, a program could read one record at a time from an I/O buffer which contained the 100 records from a single block of data on the tape. When the I/O buffer was depleted of records, the next 100 records were read in from the next block of records on the tape. Similarly, programs could write one record at a time to the I/O buffer, and when the I/O buffer was filled with 100 records, the entire I/O buffer with 100 records in it was written as the next block of data on an output tape.

The use of a blocked I/O buffer provided a significant distinction between the way data was physically stored on tape and the way programs logically processed the data. The difference between the way things are physically implemented and the way things are logically viewed by software is a really big deal in IT. The history of IT over the past 82 years has really been a history of logically abstracting physical things through the increasing use of layers of abstraction, to the point where today, IT professionals rarely think of physical things at all. Everything just resides in a logical “Cloud”. I think that taking more of a logical view of things, rather than taking a physical view of things, would greatly help biologists at this point in the history of biology. Biologists should not get so hung up about where the information for biological software is physically located. Rather, biologists should take a cue from IT professionals, and start thinking more of biological software in logical terms, rather than physical terms.

Figure 6 – Between each record, or block of records, on a magnetic tape, there was a 0.5-inch gap of “junk” tape. The “junk” tape allowed for the acceleration and deceleration of the tape reel as it spun past the read/write head on a tape drive. Since an 80-byte record only came to 80/6250 = 0.0128 inches, it made sense to block many records together into a single block that could be read by the tape drive in a single I/O operation. For example, blocking 100 80-byte records increased the block size to 8000/6250 = 1.28 inches, and between each 1.28-inch block of data on the tape, there was a 0.5-inch gap of “junk” tape for a total of 1.78 inches per block.

Figure 7 – Blocking records on tape allowed data to be stored more efficiently.

So it took 31 tapes to just store the rudimentary account data for 50 million customers. The problem was that each tape could only store 123 MB of data. Not too good, considering that today you can buy a 1 TB PC disk drive that can hold 8525 times as much data for about $50! Today, you could also store about 4,263 times as much data on a $50 128 GB thumb drive. So how could you find the data for a particular customer on 74,000 feet (14 miles) of tape? Well, you really could not do that reading one block of data at a time with the read/write head of a tape drive, so we processed data with batch jobs using lots of input and output tapes. Generally, we had a Master Customer File on 31 tapes and a large number of Transaction tapes with insert, update and delete records for customers. All the tapes were sorted by the CustomerID field, and our programs would read a Master tape and a Transaction tape at the same time and apply the inserts, updates and deletes on the Transaction tape to a new Master tape. So your batch job would read a Master and Transaction input tape at the same time and would then write to a single new Master output tape. These batch jobs would run for many hours, with lots of mounting and unmounting of dozens of tapes.

Figure 8 – Batch processing of 50 million customers took a lot of tapes and tape drives.

Biological Access Methods
Nearly all biological functions are performed by proteins. A protein is formed by combining 20 different amino acids into different sequences, and on average it takes about 400 amino acids strung together to form a functional protein. The information to do that is encoded in base pairs running along a strand of DNA. Each base can be in one of four states – A, C, G, or T, and an A will always be found to pair with a T, while a C will always pair with a G. So DNA is really a 2 track tape with one data track and one parity track. For example, if there is an A on the DNA data track, you will find a T on the DNA parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.

Figure 9 – DNA is a two-track tape, with one data track and one parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.

Now a single base pair can code for 4 different amino acids because a single base pair can be in one of 4 states. Two base pairs can code for 4 x 4 = 16 different amino acids, which is not enough. Three base pairs can code for 4 x 4 x 4 = 64 amino acids which are more than enough to code for 20 different amino acids. So it takes a minimum of three bases to fully encode the 20 different amino acids, leaving 44 combinations left over for redundancy. Biologists call these three base pair combinations a “codon”, but a codon really is just a biological byte composed of three biological bits or base pairs that code for an amino acid. Actually, three of the base pair combinations, or codons, are used as STOP codons – TAA, TAG and TGA which are essentially end-of-file markers designating the end of a gene along the sequential file of DNA. As with magnetic tape, there is a section of “junk” DNA between genes along the DNA 2 track tape. According to Shannon’s equation, a DNA base contains 2 bits of information, so a codon can store 6 bits. For more on this see Some More Information About Information.

Figure 10 – Three bases combine to form a codon, or a biological byte, composed of three biological bits, and encodes the information for one amino acid along the chain of amino acids that form a protein.

The beginning of a gene is denoted by a section of promoter DNA that identifies the beginning of the gene, like the CustomerID field on a record, and the gene is terminated by a STOP codon of TAA, TAG or TGA. Just as there was a 0.50-inch gap of “junk” tape between blocks of records on a magnetic computer tape, there is a section of “junk” DNA between each gene along the 6 feet of DNA tape found within human cells.

Figure 11 - On average, each gene is about 400 codons long and ends in a STOP codon TAA, TAG or TGA which are essentially end-of-file markers designating the end of a gene along the sequential file of DNA. As with magnetic tape, there is a section of “junk” DNA between genes which is shown in grey above.

In order to build a protein, genes are first transcribed to an I/O buffer called mRNA. The 2-track DNA file for a gene is first opened near the promoter of a gene and an enzyme called RNA polymerase then begins to copy the codons or biological bytes along the data track of the DNA tape to an mRNA I/O buffer. The mRNA I/O buffer is then read by a ribosome read/write head as it travels along the mRNA I/O buffer. The ribosome read/write head reads each codon or biological byte of data along the mRNA I/O buffer and writes out a chain of amino acids as tRNA brings in one amino acid after another in the sequence specified by the mRNA I/O buffer.

Figure 12 - In order to build a protein, genes are first transcribed to an I/O buffer called mRNA. The 2-track DNA file for a gene is first opened near the promoter of a gene and an enzyme called RNA polymerase then begins to copy the codons or biological bytes along the data track of the DNA tape to the mRNA I/O buffer. The mRNA I/O buffer is then read by a ribosome read/write head as it travels along the mRNA I/O buffer. The ribosome read/write head reads each codon or biological byte of data along the mRNA I/O buffer and writes out a chain of amino acids as tRNA brings in one amino acid after another in the sequence specified by the mRNA I/O buffer.

Figure 13 – In addition, the DNA of eukaryotic carbon-based life that is composed of cells that are more complicated than the simple prokaryotic cells of the bacteria and archaea is wrapped around histone proteins like magnetic computer tape wrapped around little reels, forming nucleosomes, and then is packed into chromatin fibers that are then wound up into chromosomes.

Figure 14 – Chromatin performs the functions of the tape racks of yore and allows DNA to be highly compacted for storage and also allows for the controlled expression of genes by means of epigenetic factors in play. Each tape in a rack had an external label known as a volume serial number which identified the tape.

How Nanopore Sequencing of DNA and RNA Works
With the above background at hand let us now explore the engineering that allows nanopore sequencing sequencing to work. A very complete history of this very important technology can be found at:

Nanopore Sequencing
https://www.whatisbiotechnology.org/index.php/science/summary/nanopore/nanopore-sequencing-makes-it-possible-to-decode-the

It all began on June 25, 1989, while Dave Deamer was on a Sunday drive in Oregon.

Figure 15 – Above is Dave Deamer's conceptual sketch of how a DNA sequencer could directly read DNA one base at a time like the read/write head of a tape drive as a strand of DNA passed through a small hole in a membrane. He drew the above sketch after pulling over to the side of the road during a one-hour drive in Oregon.

The text reads:
Sunday June 25 1989. Driving back from Eugene -> Belmont Lodge, had an idea on how to sequence DNA directly.

Main concept: DNA will be driven through a small channel, either by ΔY or ΔpH. The channel will be carrying a current, driven by ΔΨ. As each base passes through, a change in the current will occur. Because the bases are of different size, the current change will be proportional, thereby providing an indication of which base it is.

Details:The thickness of the membrane must be very thin, perhaps a polymerized bilayer. The channel must be of the dimensions of DNA in cross section, approx. 1-2nm. Porin? Complement? Alamethicin? The ion flux might be protonic.

In the above scheme, ΔY and ΔΨ are both voltage differences ΔV across a polymerized bilayer membrane.

Dave Deamer is truly a membrane expert. We have seen him put them to good use in the Hot Spring Origins Hypothesis that Dave Deamer and Bruce Damer developed for the origin of carbon-based life on the Earth about four billion years ago. For more on that see The Bootstrapping Algorithm of Carbon-Based Life and Urability Requires Durability to Produce Galactic Machine-Based Intelligences. To fully understand nanopore sequencing you need to understand membranes.

Figure 16 – A cell membrane consists of a phospholipid bilayer with embedded molecules that allow for a controlled input-output to the cell. Once we have a membrane, we can fill the "inside" with organic molecules that are capable of doing things that then interact with organic molecules on the "outside".

Figure 17 – Water molecules are polar molecules that have a positive end and a negative end because oxygen atoms attract the bonding electrons more strongly than do the hydrogen atoms. The positive ends of water molecules attract the negative ends of other water molecules to form a loosely coupled network of water molecules with a minimum of free energy.

Figure 18 – How soap and water work. The lipids in a bar of soap have water-loving polar heads and water-hating nonpolar tails. When in water, the soap lipids can form a spherical micelle that has all of the water-hating nonpolar tails facing inwards. Then the spherical micelles can surround the greasy nonpolar molecules of body oils and allow them to be flushed away by a stream of polar water molecules. The lipids in a bar of soap can also form a cell-like liposome with a bilayer of lipid molecules that can surround the monomers and polymers of life.

Similarly, in The Role of Membranes in the Evolution of Software, I explained how the isolation of processing functions within membranes progressed as the architecture of software slowly evolved over time.

Figure 19 – Above is a general view of how a nanopore sequencer works. It consists of a bilayer membrane with a hole drilled through it. The hole is lined with a pipe-like protein molecule called Alpha-hemolysin to keep it open. A battery is then used to apply a small voltage difference between the inside and the outside of the membrane. The fluid below the membrane is now at a higher voltage. The bases on a strand of DNA or RNA have a slight negative charge, so the electric force from the voltage difference across the membrane will pull the DNA or RNA through the membrane hole. It will also pull lots of negatively charged ions through the hole producing a current. The bases on DNA and RNA have different sizes. The larger bases will clog up the hole more than the smaller bases. This will cause the amount of current flowing through the hole to fluctuate as the DNA or RNA strand is pulled through by the electric field. By measuring the current flow through the hole, one can see each base pass by like the bits passing by the read/write head of a tape drive.

In this regard, the nanopore sequencer behaves like a vacuum tube or a transistor to modify the current flowing from the outside of the membrane to the inside of the membrane.

Figure 20 – Vacuum tubes contain a hot negative cathode that glows red and boils off electrons. The electrons are attracted to the cold positive anode plate, but there is a gate electrode between the cathode and anode plate. By changing the voltage on the grid, the vacuum tube can control the flow of electrons like the handle of a faucet. The grid voltage can be adjusted so that the electron flow is full blast, a trickle, or completely shut off, and that is how a vacuum tube can be used as a switch.

Figure 21 – A FET transistor consists of a source, gate and drain. When a positive voltage is applied to the gate, a current of electrons can flow from the source to the drain and the FET acts like a closed switch that is “on”. When there is no positive voltage on the gate, no current can flow from the source to the drain, and the FET acts like an open switch that is “off”.

Figure 22 – When there is no positive voltage on the gate, the FET transistor is switched off, and when there is a positive voltage on the gate the FET transistor is switched on. These two states can be used to store a binary “0” or “1”, or can be used as a switch in a logic gate, just like an electrical relay or a vacuum tube.

Figure 23 – Above is a plumbing analogy that uses a faucet or valve handle to simulate the actions of the source, gate and drain of an FET transistor.

Figure 24 – Of course, things get a little more complicated when you actually try to build them. It turns out that the DNA and RNA strands get sucked through the hole by the electric field too quickly to be measured. To slow down the process a motor protein was inserted into the top of the Alpha-hemolysin protein lining the hole to ratchet the DNA and RNA strands down through the hole one base at a time so that there was enough time to measure the current disruption caused by each base as it passed through the hole head.

Figure 25 – The top of the Alpha-hemolysin protein forms a flange that the motor protein can easily fit into. You will find a similar flange in the floor under your toilets.

Figure 26 – The top of the Alpha-hemolysin protein forms a flange that the motor protein can easily fit into. You will find a similar flange in the floor under your toilets.

Figure 27 – The motor protein ratchets the DNA and RNA strands through the hole like the film advance mechanism on old-fashioned movie film projectors.

But there was just one problem with the naturally occurring Alpha-hemolysin protein. The neck on its flange-like shape was just a little too long. That meant that about 10 - 12 bases would always be in the neck of its flange even if the motor protein was able to ratchet just one base at a time into the throat of its flange neck. That meant that many bases would always be clogging up the throat of its flange neck at the same time. We have all seen toilets in a similar condition. That produced some very complex variations in the current of ions trying to pass through the neck of the Alpha-hemolysin protein that were very hard to analyze. The natural solution was to shorten the neck of the flange and that was done by bioengineering two protein molecules called CsgG and CsgF to combine together into a new complex with a very short flange neck to replace the Alpha-hemolysin protein.

Figure 28 – The very long flange neck of the Alpha-hemolysin protein was replaced by a CsgG-CsgF complex with a much shorter flange neck. This made it easier to read the bases along a strand of DNA or RNA because it reduced the number of bases that were in the flange neck at the same time.

But even the CsgG-CsgF complex had a flange neck that was too long. To solve the problem Deep Learning neural networks are used to identify 5-base stretches of DNA or RNA at a time. Each 5-base stretch of bases is called a k-mer. A k-mer of DNA bases is a substring of length k in a DNA sequence. For example, all 2-mers of the sequence AATTGGCCG are AA, AT, TT, TG, GG, GC, CC, CG. Similarly, all 3-mers of the sequence AATTGGCCG are AAT, ATT, TTG, TGG, GGC, GCC, CCG. For the CsgG-CsgF complex, a 5-mer is used that has 1024 combinations. The Deep Learning neural networks are trained using synthesized 5-mer lengths of DNA and RNA bases with known sequences so that the Deep Learning neural networks learn to recognize the 5-mer substrings of bases. They can do this with over 99% accuracy. This is harder than it sounds because the bases in the throat of the CsgG-CsgF flange throat are bouncing around and introducing thermal noise into the ion current flow through the throat.

Déjà vu All Over Again
All of this very complicated mechanical and electrical engineering on a molecular level seemed strangely familiar to me. Then it suddenly dawned on me. These people were drilling and logging oil wells at the molecular level on biological membranes! As you may recall from Introduction to Softwarephysics, I started out in 1975 as an exploration geophysicist exploring for oil, first with Shell and then with Amoco, before transitioning to IT in 1979. As a geophysicist by training, I am now greatly concerned by the devastation of the climate change we are now seeing unfolding before our very eyes as I outlined in Last Call for Carbon-Based Intelligence on Planet Earth and This Message on Climate Change Was Brought to You by SOFTWARE. But since the nanopore sequencing industry is so new, perhaps there is something they can learn from the oil industry as they continue to "make hole" in the industry parlance.

Figure 29 – Above is a completed production oil well. The finished borehole below the drilling rig has penetrated many membrane layers of rock and is lined with a steel casing pipe, similar to the Alpha-hemolysin protein, to keep the hole open and allow for the control of the fluids in the borehole. The steel casing is cemented to the borehole walls and at the productive layers that contain oil or natural gas the casing is perforated to allow the oil and natural gas to enter the well.

Figure 30 – Above are the basic parts of a drilling rig. At the base is a rotary table that spins at about 50 - 250 rpm. The Kelly bushing can clamp onto the rotary table when drilling. This causes the Kelly pipe above the Kelly bushing to spin. The Kelly pipe contains the top segment of the drill pipe that has just been attached to the drill string of the drill pipe. The Kelly pipe can move up and down the Kelly bushing as drilling proceeds. So when the rotary table begins to spin, the Kelly bushing begins to spin, causing the Kelly pipe to spin and ultimately all of the drill pipe in the drill string to rotate. Drill pipe comes in lengths of about 30 feet. After the latest segment of drill pipe has gone down the hole, the Kelly pipe can be raised through the Kelly bushing to allow the next segment of drill pipe to be added at the top. So the rotary table, Kelly bushing and Kelly pipe perform the same function as the motor protein in a nanopore sequencer that ratchets one base at a time down the cased nanopore hole. Similarly, the rotary table, Kelly bushing and Kelly pipe ratchet one 30-foot length of drill pipe down the hole one length of drill pipe at a time.

Figure 31 – Above, a roughneck is handling the Kelly pipe above the Kelly bushing and the rotary table on a drilling rig.

Figure 32 – Above is a spinning rotary table, Kelly bushing and Kelly pipe. The rotary table is usually driven by a diesel engine or an electric motor.

Figure 33 – Drill pipe comes in 30-foot lengths and is connected together by tapered threaded ends.

Figure 34 – At the very end of the drill string is the drill bit. As the whole drill string is rotated by the rotary table, Kelly bushing and Kelly pipe, the drill bit grinds through the rock at the bottom of the hole. In nanopore sequencing, the DNA or RNA drill string is pulled through the membrane hole by the electrical force arising from the voltage difference between the top and bottom of the membrane. In drilling an oil well the drill string is pulled through the hole by the gravitational force arising from the heavy drill string bearing down on the rotating drill bit. Heavy drilling mud is pumped down through the drill pipe of the drilling string to lubricate and cool the drill bit. The heavy drilling mud also brings up the rock cuttings to the surface and prevents pressurized formation water from entering the hole before it is cased.

Figure 35 – Just under the drill floor of the drilling rig is a blowout preventer that seals off the borehole in case the drill bit enters a formation with fluid pressures much higher than what the drilling mud can handle.

Figure 36 – The Spindletop oil well blowout occurred on January 10, 1901. The Lucas Gusher, as it was called, blew oil over 150 feet into the air at a rate of 100,000 barrels per day. Blowouts are very bad for the environment and very dangerous too because they can catch fire. It took nine days to bring the well under control. The Spindletop oil field discovery led the United States into the oil age. Prior to Spindletop, oil was primarily used for lighting and as a lubricant. After Spindletop, oil became the primary source of energy for the country.

So You Have a Hole in the Ground Now What?
When you drill a hole into the ground you are actually drilling a hole into time. That is because the layers of rock in a sedimentary basin get older and older as you drill down. In geology, the kinds of layers you drill through and their relative sequence in time are very important. That is because each layer of sedimentary rock was laid down by a particular environment. It might have been laid down by a sandy beach producing sandstone or a muddy delta producing shale. The sequence of the layers of sedimentary rock is even more important because it tells the geological history of the region as the depositional environment changes with time. So for a geologist, the rock layers and the sequence of the rock layers are just as important as the bases and the sequence of bases along a stretch of DNA or RNA is to a biologist.

But when you are drilling an oil well, how do you tell what rock layers are down there and their sequence? You need some way of reading the sequence of layers in the hole just like a nanopore sequencer needs to read the bases along a stretch of DNA or RNA or a tape read/write head needs to read the bytes along a stretch of tape. For oil wells and water wells, this is done using well logs that look very much like the outputs of a nanopore sequencer!

It all began with the Schlumberger brothers early in the 20th century. Conrad Schlumberger was a physicist and Marcel Schlumberger was an engineer. Conrad Schlumberger had been interested in using electrical resistivity to detect ore deposits in the ground since the early 1910s. He began by experimenting with rocks in his bathtub. In 1912, he recorded the very first map of equipotential curves at his estate near Caen in Normandy, France. The resulting map confirmed the method's ability to detect metal ores and reveal features of the subsurface structure.

Figure 37 – Conrad Schlumberger's resistivity surveys consisted of a battery connected to two electrodes stuck into the ground. The potential difference between these two electrodes caused a current to flow in the ground. These currents were then detected by two other electrodes that were stuck into the ground. A voltmeter connected to these sensing electrodes measured the voltage difference caused by the electrical currents in the ground. By moving all of these electrodes back and forth along a line on the ground one could conduct a resistivity survey. When the electrodes were further apart they measured the resistivity of the rock deeper underground.

Figure 38 – Above we see Conrad Schlumberger in the foreground and Marcel Schlumberger in the background conducting fieldwork. Their resistivity surveys were very successful at locating ore bodies and salt dome oil fields. The Spindletop field was a salt dome oil field.

Figure 39 – But the real payoff came when Conrad Schlumberger decided to drop his gear down an oil well. In 1927, Conrad Schlumberger and his son-in-law, Henri Doll, designed and built the first electrical resistivity well logging tool. The tool consisted of a series of electrodes that were lowered into a well on a cable. The electrodes measured the resistance of the rock formations surrounding the wellbore. In the above diagram, electrode A is connected to a battery on the truck. The other electrode of the battery was connected to the well casing that had already been set. Electrodes N and M measured the voltage difference between two points in the borehole as all three electrodes were slowly pulled up by a cable. The log of the voltage differences was sent up to the recording truck and recorded on paper. The Schlumberger brothers' first resistivity log was recorded in a well in Pechelbronn, France, on September 5, 1927. The log clearly showed the different rock formations in the well, including the oil-bearing sandstone.

Figure 40 – Then in 1931 one of the Schlumberger brothers goofed. The A electrode was not connected to the battery as usual. But to the surprise of all the N and M electrodes were still measuring voltages as they were slowly pulled out of the hole! This was the accidental discovery of the SP (Spontaneous Potential or Self Potential) well log. For some reason, the layers of sedimentary rocks were acting like little batteries all by themselves. The strange thing was that the SP logs were even better than the resistivity logs at seeing the various layers in the borehole and their sequences. To do an SP log all they had to do was stick an electrode in the ground at the surface and then lower the other electrode on a cable. Batteries not included.

Figure 41 – After much research, the oil industry figured out what was going on. Impermeable shale carried a positive charge on the surface of the borehole and permeable sandstone carried a negative charge. This produced an SP potential that the electrode on the cable could measure. Strangely, the voltage between the shale and sandstone layers is called the membrane potential.

Figure 42 – Over the years many other logging tools were invented, but the SP and resistivity logs are still very useful. Above we see an SP log and a gamma ray log. The gamma-ray log is obtained by a scintillation counter that is lowered on a cable. The scintillation counter measures the amount of gamma rays in the rock layers. Sandstone consists mainly of quartz sand which is not radioactive. Shale is formed from muddy clay that contains more radioactive elements like uranium, thorium and potassium. Notice how the two logs correlate. The sandstone layers have a lower SP and gamma-ray count than the shale layers. Notice their similarity to the sequence log of DNA bases shown in Figure 2 above.

Conclusion
The first commercial nanopore DNA sequencer finally came to market in 2014, 25 years after Dave Deamer's first insights in 1989. It took a great deal of work and perseverance by many people in the face of many naysayers to make that happen. Again, to fully appreciate the history of nanopore DNA sequencing be sure to take a look at:

Nanopore Sequencing
https://www.whatisbiotechnology.org/index.php/science/summary/nanopore/nanopore-sequencing-makes-it-possible-to-decode-the

I do not know about you, but I smell a Nobel Prize simmering in the kitchen.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

SoftwarePhysics

Friday, September 22, 2023

An IT and Geophysical Perspective on the Nanopore Sequencing of DNA and RNA

No comments:

Blog Archive

Links