This was my initial part 2 in a series about a project to read/import a large collection of home-made optical media. imaging discs., Part 1 was , the summary page for the whole project is
Last time we prepared for the import by gathering all our discs together and organising storage for them in two senses: real-world (i.e. spindles) and a more future-proof digital storage system for the data, in my case, a NAS. This time we're actually going to read some discs. I suggest doing a quick first pass over your collection to image all the trouble-free discs (and identify the ones that are going to be harder to read). We will return to the troublesome ones in a later part.
For reading home-made optical discs, you could simply use
cp /dev/sr0 disc-image.iso
This has the attraction of being a very simple solution but I don't recommend
it, because of a lack of options for error handling. Instead I recommend using
GNU ddrescue. It is designed to be fault
tolerant and retries bad sectors in various ways to try and coax every last
byte out of the medium. Crucially, a partially imported disc image can be
further added to by subsequent runs of
ddrescue, even on a separate computer.
For the first import, I recommend the suggested options from the
ddrescue -n -b2048 /dev/cdrom cdimage.iso cdimage.log
This will create a
cdimage.iso file, hopefully containing your data, and a
cdimage.log, describing what
ddrescue managed to achieve. You
should archive both!
This will either complete reasonably quickly (within one to two minutes), or will run potentially indefinitely. Once you've got a feel for how long a successful extraction takes, I'd recommend terminating any attempt that lasts much longer than that, and putting those discs to one side in a "needs attention" pile, to be re-attempted later. If
ddrescue does finish, it will tell you if it couldn't read any of the disc. If so, put that disc in the "needs attention" pile too.
Above, I wrote that I recommend this approach for home-made data discs. Broadly, I am assuming that such discs use a limited set of options and features available to disc authors: they'll either be single session, or multisession but you aren't interested in any files that are masked by later sessions; they won't be mixed mode (no Audio tracks); there won't be anything unusual or important stored in the disc metadata, title, or subcodes; etcetera.
This is not always the case for commercial discs, or audio CDs or video DVDs.
For those, you may wish to recover more information than is available to you
ddrescue. These aren't my focus right now, so I don't have much advice
on how to handle them, although I might in the future.
labelling and storing images
If your discs are labelled as poorly or inconsistently as mine, it might not be
obvious what filename to give each disc image. For my project I decided to append a new label to all imported discs, something like "blahX", where X is an incrementing number. So, for a fourth disc being imported with the label "my files", the image name would be
my_files.blah5.iso. If you are keeping the physical discs after importing them, You could also mark the disc with "blah5".
where are we now
You should now have a pile of discs that you have successfully imported, a corresponding collection of disc image files/ddrescue log file pairs, and possibly a pile of "needs attention" discs.
In future parts, we will look at how to explore what's actually on the discs we have imaged: how to handle partially read or corrupted disc images; how to map the files on a disc to the sectors you have read, to identify which files are corrupted; and how to try to coax successful reads out of troublesome discs.