Mailinglist Archives:
Infrared
Panorama
Photo-3D
Tech-3D
Sell-3D
MF3D

Notice
This mailinglist archive is frozen since May 2001, i.e. it will stay online but will not be updated.
<-- Date Index --> <-- Thread Index --> [Author Index]

archive organization (long, probably boring)


  • From: P3D Bob Wier <wier@xxxxxxxxxxxxxxx>
  • Subject: archive organization (long, probably boring)
  • Date: Mon, 6 Jan 1997 05:30:15 -0600

Since the topic of archives has come up, here's the current
organization (probably more than you ever wanted to know!).

There are two main areas. The original listserv structure has an
actual archives directory, but it has the drawback in that it just 
accumulates giganto files of concatenated messsages. These tend to
get rather large resulting in long download times and also makes
it really difficult to find a specific topic of interest. Here are the
files in the actual archive directory:

/ftp/pub/photo/photo-3d/archives
archives 8 % ls -al
total 5199
drwxr-xr-x   2 server   system      1024 Aug  8 13:32 ./
drwxr-xr-x  10 server   system       512 Apr 20  1996 ../
-rwxr--r--   1 server   system    501655 Apr 20  1996 archive_930309.txt.Z*
-rwxr--r--   1 server   system    768857 Apr 20  1996 archive_930730.txt.Z*
-rwxr--r--   1 server   system   1120837 Apr 20  1996 archive_940121.txt.Z*
-rwxr--r--   1 server   system   2423485 Apr 20  1996 archive_941019.txt.Z*
-rwxr--r--   1 server   system     32083 Feb 28  1996 archive_9412.Z*
-rwxr--r--   1 server   system     13049 Feb 28  1996 archive_9501.Z*
-rwxr--r--   1 server   system     19617 Feb 28  1996 archive_9503.Z*
-rwxr--r--   1 server   system     21598 Feb 28  1996 archive_9504.Z*
-rwxr--r--   1 server   system     23444 Feb 28  1996 archive_9505.Z*
-rwxr--r--   1 server   system     33808 Feb 28  1996 archive_9506.Z*
-rwxr--r--   1 server   system     25168 Feb 28  1996 archive_9507.Z*
-rwxr--r--   1 server   system      9851 Feb 28  1996 archive_9508.Z*
-rwxr--r--   1 server   system     22058 Feb 28  1996 archive_9509.Z*
-rwxr--r--   1 server   system     13154 Feb 28  1996 archive_9510.Z*
-rwxr--r--   1 server   system     24496 Feb 28  1996 archive_9511.Z*
-rwxr--r--   1 server   system     18835 Feb 28  1996 archive_9512.Z*
-rwxr--r--   1 server   system     36753 Feb 28  1996 archive_9601.Z*
-rwxr--r--   1 server   system     15615 Feb 28  1996 archive_9602.Z*

Note that these files are compressed via the UNIX "compress" program
(indicated by the file extension of capitol .Z) To use them you'll need
a UNIX style decompress utility for your particular platform.
Also note well that you need a fair amount of disk space to 
hold the decompressed text file (ie, the archive_941019.txt.Z* file
is 2.5 meg as on the disk, but after de-compression it will be about
5.5 meg). Also note that the astericks are not actually part of the
filenames, but are a UNIX style indicator flag (that is,
archive_941019.txt.Z* is really archive_941019.txt.Z). The eariler
dates in the filenames give a general indication of what time period
the file was in use, but they contain several months of postings.
These are not indexed since they are too large to keep on the machine
in uncompressed format, and the search engine can't index compressed
text.

HISTORICAL NOTE:

The prior incarnation of the list (see the reference in my
message in Digest #1) was run by Marty Hoag. I don't remember
exactly when I joined, but I was not on it from the beginning.
It's fuzzy to me at this point, but I think I started getting
mail from the list about 8 years ago. Unfortunately, due to
limited storage space I don't have archives from that period. (I
was working with floppies on my desktop machine and did not have
sysadmin control back at my last school before I left there
(which is an interesting story in itself)).

The current list was restarted by John B. and 
initially headquartered at LBL. The list was moved to ETSU in
December '94 after the "naughty picture" scandal on some of
the Federal Government machines which resulted in most non-
federal gov related mailing lists being terminated (at least
at the sites then current). At this time, I was running the
Adobe Photoshop mailing list, but another list had been
started on the same subject, and since Photo-3d needed a new
home, I transferred my PHOTOSHOP subscriber files to the other list,
and continued on with running the Motorola Mailing list, the ICOM
mailing list, and the newly moved PHOTO-3D mailing list. Subsequently
of course, I've also taken on the Overland-Trails mailing list
(which is now the offical mailing list/web site for the Oregon - 
California Trails Association).

When the switch to ETSU occurred,
I set up listserv to archive monthly, which accounts for the
change from the irregular dated large files to the monthly files
beginning in Dec/Jan '94-'95.  Eventually, though, it became
clear that even these were getting too large when uncompressed
to handle a whole month at a time conveniently.

About Sept '95, it became clear that a better way to access
the past postings was needed, AND it also became clear that 
the World Wide Web might become a factor in archive
access (an understatement if there ever was one). Also, as
we passed 500 subscribers, the problem of dealing with administrivia
got out of hand (running sometimes a hundred bounced messages
a day, all of which had to be handled manually). So John managed
to pull in some sucker...er voluenteers to distribute some of this
workload - and the 3d-moderators infrastructure was set up.

A variety of mechanisms were considered as to how to go about
this. While some of the threaded mail type http formats are
convenient, there is a problem with this since each message has
it's own directory entry. The drawback is that if you have a
large HD, the space for each file's directory info is a fixed
size. It is possible, in fact, for the directory entry to EXCEED
the actual length of the message. In a UNIX based system, you
generally have to make provisions IN ADVANCE as to how may
directory entries you would like to allocate space for. So if you
guess wrong, you can a) run out of disk space for directory
entries or b) waste a bunch of disk space if you specify too many
directory entries or c) seriously degrade perforance by
generating too deep a directory tree with many thousands of files
at the wrong levels.

The most reasonable compromise appeared to be to use the DIGEST mailings
as the basic archive unit - which some of you may remember required
some size adjustments due to some services imposing a cap as to how
long an e-mail message can be (like 25k or 50k). By saving each digest
(more or less on a daily basis) the number of directory entries could
be kept under control, but still have a file size which would be
downloadable in a reasonable amout of time (especially with the increase
in modem speeds to 33kbps).

Also about this time the GLIMPSE search engine software became
available at a good price (free :-) and had some VERY nice
features (for example, unlike WAIS database engines which
typically double the size of the directory with the generated
index) GLIMPSE only increases the directory usage by 10 to 15
percent, yet is VERY fast AND offers boolean (specifically
Regular Expresson) searches. So once the digest archives were set
in place and the GLIMPSE engine installed, the monthly large
archive format was stopped in favor of the daily (more or less)
digest format style. Additionally, GLIMPSE offers both a text
based and a http based query interface.

So, beginning with approximately digest 700, photo-3d is in a new
directory:

/ftp/pub/photo/photo-3d/digests

which contains...

drwxrwxr-x   2 wier     system     39936 Dec  9 00:58 ./
drwxr-xr-x  10 server   system       512 Apr 20  1996 ../
-rwxr-xr-x   1 server   system     66866 Dec  9 00:58 .glimpse_filenames*
-rwxr-xr-x   1 server   system      4200 Dec  9 00:58 .glimpse_filenames_index*
-rwxr-xr-x   1 server   system    737329 Dec  9 00:58 .glimpse_index*
-rwxr-xr-x   1 server   system      8793 Dec  9 00:58 .glimpse_messages*
-rwxr-xr-x   1 server   system   1501560 Dec  9 00:58 .glimpse_partitions*
-rwxr-xr-x   1 server   system      6228 Dec  9 00:58 .glimpse_statistics*
-rwxr-xr-x   1 server   system    262144 Dec  9 00:58 .glimpse_turbo*
-rwxr--r--   1 server   system     23946 Apr 20  1996 3d.clubs.v2.3.txt*
-rwxr--r--   1 server   system    131898 Apr 20  1996 3d.prod.serv.v4.1.txt*
-rwxr--r--   1 server   system       664 Apr 20  1996 HEADER*
-rwxr--r--   1 server   system      1288 Apr 20  1996 NSAcomplete list of files.txt*
-rwxr--r--   1 server   system       525 Apr 20  1996 NSAnaming convention.txt*
-rwxr--r--   1 server   system      5210 Apr 20  1996 NSAv0108a.raw.txt*
-rwxr--r--   1 server   system     11095 Apr 20  1996 NSAv0108f.raw.txt*
-rwxr--r--   1 server   system     40652 Apr 20  1996 NSAv0108s.raw.txt*
-rwxr--r--   1 server   system     72019 Apr 20  1996 NSAv0919f.raw.txt*
-rwxr--r--   1 server   system      4227 Apr 20  1996 NSAv1317a.raw.txt*
-rwxr--r--   1 server   system     70292 Apr 20  1996 NSAv1317f.raw.txt*
-rwxr--r--   1 server   system     50329 Apr 20  1996 NSAv1317s.raw.txt*
-rwxr--r--   1 server   system      1172 Apr 20  1996 NSAv1818a.raw.txt*
-rwxr--r--   1 server   system     20055 Apr 20  1996 NSAv1818f.raw.txt*
-rwxr--r--   1 server   system     10887 Apr 20  1996 NSAv1818s.raw.txt*
-rwxr--r--   1 server   system     15002 Apr 20  1996 NSAv2021a.tmp.txt*
-rwxr--r--   1 server   system      7316 Apr 20  1996 NSAv2021f.tmp.txt*
-rwxr--r--   1 server   system     56902 Apr 20  1996 NSAv2021s.tmp.txt*
-rwxr--r--   1 server   system     15894 Apr 20  1996 NSAv2021t.tmp.txt*
-rw-r--r--   1 server   system      2656 Apr 20  1996 PHOTO-3D_digest_0666.txt
-rw-r--r--   1 server   system      2716 Apr 20  1996 PHOTO-3D_digest_0694.txt
-rw-r--r--   1 server   system      1512 Apr 20  1996 PHOTO-3D_digest_0695.txt
-rw-r--r--   1 server   system      4799 Apr 20  1996 PHOTO-3D_digest_0700.txt
-rw-r--r--   1 server   system     12989 Apr 20  1996 PHOTO-3D_digest_0709.txt
-rw-r--r--   1 server   system     36578 Apr 20  1996 PHOTO-3D_digest_0710.txt
-rw-r--r--   1 server   system     29812 Apr 20  1996 PHOTO-3D_digest_0711.txt
-rw-r--r--   1 server   system     17803 Apr 20  1996 PHOTO-3D_digest_0712.txt

etc etc etc

-rw-r--r--   1 server   system     23219 Dec  9 00:13 PHOTO-3D_digest_1739.txt
-rw-r--r--   1 server   system     26877 Dec  9 00:13 PHOTO-3D_digest_1740.txt
-rw-r--r--   1 server   system     24691 Dec  9 00:13 PHOTO-3D_digest_1741.txt
-rw-r--r--   1 server   system     25374 Dec  9 00:13 PHOTO-3D_digest_1742.txt
-rwxr-xr-x   1 server   system      1356 Dec  9 00:56 ghindex.html*
-rwxr--r--   1 server   system     45207 Jan  6 05:02 p3d01-05*
-rwxr--r--   1 server   system     27830 Jan  6 05:02 p3d06-10*
-rwxr--r--   1 server   system     56430 Jun  3  1996 p3d10-15*
-rwxr--r--   1 server   system     45588 Jun  3  1996 p3d16-20*
-rwxr--r--   1 server   system     57804 Jun  3  1996 p3d21-29*
-rwxr--r--   1 server   system     14916 Jun  3  1996 p3d30-39*
-rwxr--r--   1 server   system     79751 Jun  3  1996 p3d40-49*
-rwxr--r--   1 server   system     45606 Jun  3  1996 p3d50-59*
-rwxr--r--   1 server   system     63351 Jun  3  1996 p3d60-69*
-rwxr--r--   1 server   system    102783 Jun  3  1996 p3d70-79*
-rwxr--r--   1 server   system    182591 Jun  3  1996 p3d80-89*
-rwxr--r--   1 server   system    106131 Jun  3  1996 p3d90-99*

Things got a bit hinky when we had a hacker attack in March of 95 
and Bobcat was killed dead (all files erased). Fortuately between
what I had on tape on my desktop machine and John B.s files, we
were able to reconstruct the archives (mostly - some things were
permanently lost, which is why some of you might have noticed
a few of the "information" files missing which were there before).

In talking about the digest format now in use, the 3d-moderators
felt it would be worthwhile to re-generate the earlier, lbl based
files in the current format of being more or less digest based.
John B. managed to get the first 100 extracted from the very large
lbl archive files, but doing so is extremely labor intensive. 
Thus in the file listing above, you can see groups of about 10 digests
from number 1 up thru number 99. Hopefully one of us can break
loose some time to proceed on with re-generating the other 600 early
ones. I just noticed, in fact, that our numbering scheme was slightly
confusing as there were different length file names depending
on the numbers involved, which I've fixed. If we can get number
100 - 699 (more or less) done, we'll have to change it again
to handle 3 digits.

So that's the scoop. To sum up, you can access the DIGEST archives
(beginning about 700) either via anonymous ftp, the GLIMPSE search
engine, or by sequential web browsing.

To access the archives prior to that, you can use anonymous ftp,
or sequential web browsing for digests 1 - 99.

For digests #100 thru about 699, you'll have to use anonymous ftp
from the *archive* directory (as opposed to the digests directory),
and un-compress them on your local machine.

If anyone has an questions, drop me a line....

Disclaimer - this is the sequence of events as best as I remember
them, but my memory may have slipped a cog, and John B. will probably
fill in some gaps.

THANKS!

    ====== wier@xxxxxxxxxxxxxxx ======
  5:28 AM Monday, January 6, 1997
   keeper of the Photo-3d, Motorola
 MC68HC11, Overland-Trails, LDS State
Research Outline Guides and other stuff
     (currently in Ouray, Colorado)






------------------------------

End of PHOTO-3D Digest 1795
***************************
***************************
 Trouble? Send e-mail to 
 wier@xxxxxxxxxxxxxxx 
 To unsubscribe select one of the following,
 place it in the BODY of a message and send it to:
 listserv@xxxxxxxxxxxxxxx 
   unsubscribe photo-3d
   unsubscribe mc68hc11
   unsubscribe overland-trails
   unsubscribe icom
 ***************************