This week added a new tool to BIG as well as a couple more articles in the parables section. This blog explains.
I posted a new tool on the BIG app called the Genome Browser. This allows inspection of the entire human genome.
The main goal of this tool aims to show we have control over the downloaded genome files. We do that by showing we can look at all the raw data available in the files. The tool's layout is designed to address the data as a biologist would think about the genome. But it is setup to eventually show paleo text with a left-to-right reading direction.
If you look carefully you can see the 12 different ways biologists view sequence data. I have labeled those step 0 through step 11 following the addressing article I linked last week.
By inspection on this tool you can see that the first 6 rows are in various ways symmetric to the bottom 6 rows. Because we are looking at the way codons can carry language information, we don't always care about symmetric cases.
This implies the information carrying capacity of the genome can be expressed in 6 rows only. This will speed some of the analysis which is coming later.
At present the genome browser is a page on a bigger site. But the tool very much wants to be a standalone app. Almost immediately I found myself using the keyboard wanting to move around in the genome the same way as the app uses the keys to move around articles.
As a stand alone app, this would allow use of the app's menu system to navigate through the genome itself. This is important considering the amount of data that can be presented. I will wait on building that app until I see how this develops.
As it is, simple textual navigation buttons are sketched into the page itself. This works but is crude.
I have also added 2 more articles to the parable section on big.paleo.in.
The first looks at the tree in Nebuchadnezzar's dream at the start of the book of Daniel. This tree seems to be a picture of the END of the matter of recovering the inspired text, so it is another parable hinting at the scale of inspired text.
The second new parable article looks at the double flows of a river in the book of Zechariah. This may also point at the 2 different directional flows of the DNA molecule itself. Which may also point at a possible application of unused information carrying capacity in the genome.
Information Carrying Capacity
Each codon can express 64 different states of information. The Paleo alphabet only requires 25 different states, 1 state for each letter. There is a mismatch in the carrying capacity between these 2 systems. Either some of the carrying capacity of the DNA molecule is left unused, or else more is being encoded than we have considered so far. Here are 2 interesting possibilities.
It might be the case that the text is bi-capitalized. When we first worked out fonts for Paleo we put in lower case Paleo letters. Our Paleo keyboard layouts support lower case letters too. These may be used for grammatical letters. Since punctuation does not have any meaning in a bi-capitalized string, there would be 25 + 22 = 47 different glyphs encoded by the 64 cases of the codons. 47 out of 64 is still not complete coverage, but better.
The other possible way the carrying capacity of the codons might be used is through a second string of text running opposite to the primary string. By using 2 different reference frames this might be going on with dramatic results. The don't care entries in the codon table would mark base pairs that are only used on the opposite strand running the opposite way.
2 texts woven together this way would not be completely independent, there does not appear to be enough excess carrying capacity for independent stories. But, the 2 texts would not be a simple letter-for-letter substitution of each other either.
Both the bi-capitalized and bidirectional cases are interesting to consider. But, we won't know for sure how that excess carrying capacity is actually used, if at all, until the codon to paleo letter table is worked out.
With as much data as is in the genome, I usually want to see 2 different code bases that can access that data independently. This so bugs are easier to find.
Any results in the first code base can be checked against the other. The first code base for this project is the genome browser I discussed above. The second code base for this project is a library for use in shell programs for running tests and audits against the genome. I will turn my attention to that second code base next week.
We are traveling this week, so work is slow. Regular work should pick up again next week.