Below are the five most recent posts in my weblog. You can also see a chronological list of all posts, dating back to 1999.
This is part 2 in a series about a project to read/import a large collection of home-made optical media. Part 1 was Imaging DVD-Rs: Overview > and Step 1; the summary page for the whole project is imaging discs.
Last time we prepared for the import by gathering all our discs together and organising storage for them in two senses: real-world (i.e. spindles) and a more future-proof digital storage system for the data, in my case, a NAS. This time we're actually going to read some discs. I suggest doing a quick first pass over your collection to image all the trouble-free discs (and identify the ones that are going to be harder to read). We will return to the troublesome ones in a later part.
For reading home-made optical discs, you could simply use
cp /dev/sr0 disc-image.iso
This has the attraction of being a very simple solution but I don't recommend
it, because of a lack of options for error handling. Instead I recommend using
GNU ddrescue. It is designed to be fault
tolerant and retries bad sectors in various ways to try and coax every last
byte out of the medium. Crucially, a partially imported disc image can be
further added to by subsequent runs of
ddrescue, even on a separate computer.
For the first import, I recommend the suggested options from the
ddrescue -n -b2048 /dev/cdrom cdimage.iso cdimage.log
This will create a
cdimage.iso file, hopefully containing your data, and a
cdimage.log, describing what
ddrescue managed to achieve. You
should archive both!
This will either complete reasonably quickly (within one to two minutes), or will run potentially indefinitely. Once you've got a feel for how long a successful extraction takes, I'd recommend terminating any attempt that lasts much longer than that, and putting those discs to one side in a "needs attention" pile, to be re-attempted later. If
ddrescue does finish, it will tell you if it couldn't read any of the disc. If so, put that disc in the "needs attention" pile too.
Above, I wrote that I recommend this approach for home-made data discs. Broadly, I am assuming that such discs use a limited set of options and features available to disc authors: they'll either be single session, or multisession but you aren't interested in any files that are masked by later sessions; they won't be mixed mode (no Audio tracks); there won't be anything unusual or important stored in the disc metadata, title, or subcodes; etcetera.
This is not always the case for commercial discs, or audio CDs or video DVDs.
For those, you may wish to recover more information than is available to you
ddrescue. These aren't my focus right now, so I don't have much advice
on how to handle them, although I might in the future.
labelling and storing images
If your discs are labelled as poorly or inconsistently as mine, it might not be
obvious what filename to give each disc image. For my project I decided to append a new label to all imported discs, something like "blahX", where X is an incrementing number. So, for a fourth disc being imported with the label "my files", the image name would be
my_files.blah5.iso. If you are keeping the physical discs after importing them, You could also mark the disc with "blah5".
where are we now
You should now have a pile of discs that you have successfully imported, a corresponding collection of disc image files/ddrescue log file pairs, and possibly a pile of "needs attention" discs.
In future parts, we will look at how to explore what's actually on the discs we have imaged: how to handle partially read or corrupted disc images; how to map the files on a disc to the sectors you have read, to identify which files are corrupted; and how to try to coax successful reads out of troublesome discs.
Tomorrow marks my 10th anniversary on Twitter. I have mixed feelings about the occasion. Twitter has been both a terrific success and a horrific failure. I've enjoyed it, I've discovered interesting people via Twitter and had some great interactions. I certainly prefer it to Facebook, but that's not a high benchmark.
Back in the early days I tried to engage with Twitter the way a hacker would. I worked out a scheme to archive my own tweets. I wrote a twitter bot. But Twitter became more and more hostile to that kind of interaction, so I no longer bother. Anything I put on Twitter I consider ephemeral. I've given up backing up my own tweets, conversations, or favourites. I deleted the bot. I keep a "sliding window" of recent tweets, outside of which I delete (via tweetdelete). My window started out a year wide; now it's down to three months.
Asides from the general hostility to third-parties wanting to build on the Twitter platform, they've also done a really poor job of managing bad actors. Of the the tools they do offer, they save the best for people with "verified" status: ostensibly a system for preventing fakes, now consider by some a status symbol. Twitter have done nothing to counter this, in fact they've actively encouraged it, by withdrawing it in at least one case from a notorious troll as an ad-hoc form of punishment. For the rest of us, the tools are woefully inadequate. If you find yourself on the receiving end of even a small pocket of bad attention, twitter becomes effectively unusable for hours or days on end. Finally troll-in-chief (and now President of the US) is inexplicably still permitted on Twitter despite repeatedly and egregiously violating their terms of service, demonstrating that there's different rules for some folks than the rest of us.
(By the way, I thoroughly recommend looking at Block Lists/Bots. I'm blocking thousands of accounts, although the system I've been using appears to have been abandoned. It might be worth a look at blocktogether.org; I intend to at some point.)
To some extent Twitter is responsible for—if not the death, the mortal wounding— of blogging. Back in the dim-and-distant, we'd write blog posts for the idle thoughts (e.g.), and they've migrated quite comfortably to tweets, but it seems to have had a sapping effect on people writing even longer-form stuff. Twitter isn't the only culprit: Google sunsetting Reader in 2013 was an even bigger blow, and I've still not managed to find something to replace it. (Plenty of alternatives exist; but the habit has died.)
One of the well-meaning, spontaneous things that came from the Twitter community was the notion of "Follow Friday": on Fridays, folks would nominate other interesting folks that you might like to follow. In that spirit, and wishing to try boost the idea of blogging again, I'd like to nominate some interesting blogs that you might enjoy. (Feel free to recommend me some more blogs to read in the comments!):
- Vicky Lai first came up on my radar via Her One Bag, documenting her nomadic lifestyle (Hello UltraNav keyboard, and Stanley travel mug!), but her main site is worth following, too. Most recently she's written up how she makes her twitter ephemeral using AWS Lambda.
- Alex Beal, who I have already mentioned.
- Chris Siebenmann, a UNIX systems administrator at the University of Toronto. Siebenmann's blog feels to me like it comes from a parallel Universe where I stuck it out as a sysadmin, and got institutional support to do the job justice (I didn't, and I didn't.)
- Darren Wilkinson writes about Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics
- Friend of the family Mina writes candidly and brilliantly about her journey beating Lymphoma as a new mum at Lymphoma, Raphi and me
- Ashley Pomeroy writes infrequently, eclectically (and surreally) on a range of topics, from the history of the Playstation 3, running old games on modern machines, photography and Thinkpads.
- Ted Unangst writes with clarity and explains some of the design decisions that have gone into OpenBSD. Sometimes I wish we could achieve similar things in Debian (as I wrote last November).
- I was probably the last person on Earth to discover Raymond Chen's (of Microsoft) "The Old New Thing".
Finally, a more pleasing decennial: this year marks 10 years since my first uploaded package for Debian.
Every now and then, for one reason or another, I am sat in front of a Linux-powered computer with the graphical user interface disabled, instead using an old-school text-only mode.
There's a strange, peaceful quality about these environments.
When I first started using computers in the 90s, the Text Mode was the inferior, non-multitasking system that you generally avoided unless you were trying to do something specific (like run Doom without any other programs eating up your RAM).
On a modern Linux (or BSD) machine, unless you are specifically trying to do something graphical, the power and utility of the machine is hardly diminished at all in this mode. The surface looks calm: there's nothing much visibly going on, just the steady blink of the command prompt, as if the whole machine is completely dedicated to you, and is waiting poised to do whatever you ask of it next. Yet most of the same background tasks are running as normal, doing whatever they do.
One difference, however, is the distractions. Rather like when you drive out of a city to the countryside and suddenly notice the absence of background noise, background light, etc., working at a text terminal — relative to a regular graphical desktop — can be a very calming experience.
So I decided to take a fresh look at my desktop and see whether there were unwelcome distractions. For some time now I've been using a flat background colour to avoid visual clutter. After some thought I realised that most of the time I didn't need to see what was in GNOME3's taskbar. I found and installed this hide-top-bar extension and now it's tucked away unless I mouse up to the top. Now that it's out of the way by default, I actually put more information into it: the full date in the time display; and (via another extension, TopIcons Plus) the various noisy icons that apps like Dropbox, OpenBox, VLC, etc. provide.
There's still some work to do, notably in my browser (Firefox), but I think this is a good start.
A couple of weeks ago I gave a talk at the Third Annual UK System Research Challenges Workshop. This was my first conference attendance (let alone talk) in a while (3 years!), and my first ever academic conference, although the talk I gave was work-related: Containerizing Middleware Applications, (abstract, PDF slides, paper/notes). I also did a brief impromptu lightning-talk about software preservation, with Chocolate Doom as a case study.
The venue was a Country Club/Spa Hotel quite near to where I live. For the first time I managed to fit a swim in every morning I was there, something I'd like to repeat next time I'm away at a hotel with a pool.
It was great to watch some academic talks. The workshop is designed to be small and welcoming to new researchers. It was very useful to see what quality and level of detail fellow PhD students (further along with their research) are producing, and there were some very interesting talks (here's the programme)
Thanks to the sponsors (including my own employer) who made it possible.
I had considered giving a talk on my PhD topic, but it was not quite at the stage where I had something ready to share. I'm also aware I haven't written a word on it here either, and that's something I urgently want to address. My proposal is due quite soon and so I should have much to write about afterwards.
Can anyone recommend software for running a web service similar to archive.org?
We are looking for something similar to manage digital assets within the Computing History Special Interest Group.
One suggestion I've had is CKAN which looks very interesting but possibly more geared towards opening up an API to existing live data (such as an relational DB of stuff, distributed or otherwise). We are mostly concerned with relatively static data sets: source code archives, collections of various types of publications, collections of images, etc.
(Having said that, there are some interesting possibilities for projects that consume the data sets in some fashion, perhaps via a web service, for e.g. reviewing OCR results for old raster scans of papers.)
I envisage something similar to the software powering archive.org. We want both something that lets people explore collections of stuff via the web, including potentially via machine-friendly APIs in some cases; but also ideally manage uploading and categorising items via the web as well.
I've also had suggestions to look at media-manager software, but what I've seen so far is designed for personal media collections like movies, photos, etc., and focussed more on streaming them to LAN clients.
Can anyone recommend something worth looking at?
Older posts are available on the all posts page.