paperlined.org
cgi-bin > porn > tgp_condom
document updated 17 years ago, on Dec 6, 2007
to-do:
    - convert the metadata to .xml format...  the idea is to, OVER TIME, separate the code from the
      data as much as possible...  this isn't even 100% possible currently, but perhaps it will be
      eventually?

    - move as many domains over to the getkey/mapkey system, rather than using more regexps than are
      necessary

    - move to a paradigm where the tool TELLS ME when it needs extra information...  it hilights all
      the possible problems (eg. possible extra affiliate ID; possible extraneous www.; unrecognized
      host; etc)  (so that I don't have to constantly double-check on it...  it knows much quicker
      than I do)

    - do more pre-processing on the host rules, to simultaneously allow the metadata-specification
      file to be simpler, while also allowing the matching to run quickly
        - at the very least, allow ALL of one host's data to be located in ONE spot
    
    - (possibly?) allow the tool to UPDATE the metadata itself, after I directly answer questions
      from it...
        - ESPECIALLY when there are %per_page_key's...  it should be able to FIGURE out what each
          key is, automatically!, and then cache its result
        - it should be able to FIGURE OUT if www. redirects to the base one or not (and visa versa)

    - begin including metadata on DIFFERENT TYPES of pages...  right now, it's really FHG-centric...
        start including metadata on TGP's, TGP-thumbs, and FHG-thumbs.
                (I've got TGP and TGP-thumb metadata included in the main module, but again, it
                needs to start being separated out)



Anyway, this CAN be a more gradual transition:

    iteration 1: code and data tightly integrated; data formatted purely for speed, with no regard for readability/maintainability

    iteration 2: adds a preprocessor to allow data to be converted to a format that's speedy, but stored in a format that's readable/maintainable (though still in perl);
                    at least SOME pressure to separate code and data whenever possible

    iteration 3: convert data from perl to XML format; increasing pressure to move hard-coded exceptions to pure-data

    iteration 4: ALL hard-coded exceptions have been moved to pure-data;
            XML format is writeable as well as readable, allowing the code to move its proprietary cache data into
                        


Other TODO items (unrelated to the "multiple great migrations" outlined above):

    - remember ALL FHG URLs we run across...  at the very least, this can be useful for historical
      analysis (if not making our own collection of stuff)
            - possibly even cache the thumbnails?

    - START BEING ABLE TO RUN GOOGLE DATA (as well as historical TGP data) through this in BATCH
      MODE.  It should continue to support the real-time mode as well, but since the metadata here
      is increasingly becoming the most valuable part of this, it would be nice to 1) leverage it
      for other things, 2) make it more mature by using it on a wider variety of data.

    - RECORD ALL TGP->FHG relationships...  while this takes up extra space, and isn't
      TERRIBLY valuable...  it's still actually useful data, and since we're trying to get the
      most-bang-for-the-byte-downloaded, we should record this too





Priority/schedule:
    - start recording all FHG URLs  (we can record their thumbnails at a later time perhaps)
    - transition from iter#1 to iter#2
    - have it hilight metadata improvements it's almost certain need to be made
    - get the batch-mode version working (either with historical data, or google data, or both)



What are the goals here?

    - make a large (but static) archive of plain FHG URLs available?
            (doesn't lend itself to a "community", but they're much simpler to digest anyway)

    - make the metadata available
            (lends itself to community-sharing...  on the other hand, the number of people who can
            write or digest these things is pretty small...  on the other hand, they're EXTREMELY useful)

    -