Paleo.In

Geeks In The Room

Feedback from the past few posts on the manuscript recovery work has been very strong. We have a bunch of technically trained readers that are very interested in this work. So in this post I dive deep into the design of the code. If you are not a geek, perhaps come back next week? Otherwise, read on for more.

Last Week

Picking up from last week's post, at the time of writing that post I had the new recursive solver working. Using the commandments as a test case, it generated over 1000 possible Eve documents from what turned out to be 4 trillion inbound test cases. (I will always round down when giving these sorts of counts.)

At the time I did not have the code to look at each outbound test case. But, soon after writing that blog post I could look at results. Displaying the Adam text we know and the 1000 candidate unseen Eve texts.

My last attempt at this was code done in June, and it ran as a batch process generating text files as output. This work flow is slow to setup, but stock text editors are fast and can display huge files. So with that code it was easy to inspect output.

This time around there is a web based User Interface. When finished it will allow setting up and running massive numbers of test cases with a few taps of a finger on a tablet. But, web interfaces hate huge data sets.

As soon as I added display for those 1000 results, the UI bogged way down. And, it was obvious I needed much better tools for attempting to read the Eve side. Who am I to say what Eve says? Reading Eve by machine is going to take much more work.

It was immediately obvious it was time to get a good overall design for how this entire thing is going to work. How to break the problem down into small enough pieces that a browser window never overloads. The design must also make sense from the perspective of someone looking at it for the first time.

Multiple Views

I have been bothered by the tyranny of the urgent recently. Looking for break-in instead of looking for the solve. At this point I decided to stop worrying about getting this done fast, and start to worry about getting it done at all.

So, I sketched out a prototype for a new UI. It currently looks like there are 10 different views into the problem of solving for the text. The test case generator itself being the first of those, already roughly working.

Remember, the biggest part of this problem is manuscript preparation, both of the historical docs and a feedback loop from inspired Eves. In the June version this is handled in massive text files and batch jobs. To do this right will require a custom UI driven mostly by touch. This part of the problem will take 3 more views.

The results themselves, those 1000 I discussed above, will take another view. It must be designed to scroll through specific results lists. This will allow extensive annotation for each test result while staying mindful of limits in the browser.

There is another view to watch the workload on the server. Even with a fast solver there are 31,000 verses and eventually all need to be tested. Our computer power needs eventually to be fully utilized 24/7 and we need to visualize that problem. It should be possible for a quick case to speed past a case taking a week to solve.

The Adam and Eve documents themselves each need views. Adam being effectively the inspired Testimony. Eve being something new.

Then there are 2 system views, one for managing the live interface to revision control, ie the exit path out of the app for the inspired documents and finally a stats view to count all sorts of stuff and display it.

In my prototype for these views I used our normal 'article' based UI framework, adding a tool bar to each view to show the primary operations in each view. I am not trying to build a work of art, just something that runs and can be understood. With that done I turned to the server and build problems.

Security

This is turning into the first app where we put serious workload on the server. We did this years and years ago with php websites that had to compensate for browser differences. But sites with server workload are also sites that can be hacked. It usually takes staff to monitor for problems, more staff than we had, there are only 2 of us, after all. The fix was to not use code on the server at all. Browser compensations had to happen in the browser.

All of our current public facing sites treat the server as a read-only stop between our development machines and end-users. The feedback forms in the About pages being the only exception, and they only send data privately back to us. If that path was ever hacked, we would get a river of spam and we would know immediately and could stop the river quickly.

So to handle server security, in this case, the server is NOT a public facing machine. So no serious risk with hackers. It is intended to support more than 1 person using it at a time on a private network we already use for testing.

The account used for web servers is highly restricted, and there were configuration issues to get a working space on the server hard drives.

Build

Once the UI prototype was finished, and the rough plan for the server was done, I turned my attention to the build process.

This is a strange set of programs that build websites and all the related bits and pieces. I reworked that system last winter, but have not completely converted all of our current sites to the new system.

Our build environment in particular really needed a linker to handle server side Javascript programs. Since we already use the Webpack linker for the browser, I was able to use that code as a model for new code for the server. The linker now runs for the main server service, cgi scripts and server side Javascript apps if needed. Since those apps are now single files, the deploy process is very happy.

Javascript is a really nice, fast, language. But the inventor of node gave a long talk about his failures in the area of packaging. Linkers fix the worst of his design errors.

Services

Finally, server side apps are typically called services because they start running when the server machine is turned on. You can't turn on a server by picking up a tablet and clicking. The tablet wants a website already there and already running, waiting to do someone's bidding.

All the weird configuration bits to make all the right services run comes from build environment tools too. Those got updated and configured this week. Nginx is used for all static files. Uwsgi is used for synchronous cgi scripts. Init.d running forever running node running the server for the solver, especially the job queue. Under the job queue is where the compiled C code app for solving will run.

So now it is possible to build and deploy the whole thing with a couple simple commands. The serve comes up and is ready to go as soon as the build finishes. Very nice, and time to go back to making the code work.

Git

One of the views I sketched out above is the git view. Git is Linus Torvald's revision control system. Since the outbound side of this riddle will deliver text to git it made sense to give git a UI inside this new app.

I then realized there were several other git projects that should go through the same interface instead of through the build system. Basically tying certain projects already found in git to the running server instead of being tied to the build environment. Most importantly is the git project that holds the texts passed to us by history.

Finally, today, Friday, I got the UI for git mostly working, and all the git projects that are needed can come directly to the server as needed. This also needed configuration changes in the git server itself since the web server behaves like a person to revision control.

More Views

I expect each of the 8 remaining views to take between 1 and 3 days of work for each to get sketched out. So not much more to report on code wise for another couple weeks.

Hopefully the Geeks in the room are satisfied?

More Later,

Phil