Security Issues for Registries

1. Introduction

This is a short paper on the topic of security for a shared TLD
registry.  Since there isn't a design in place for the CORE DB it is
necessarily at a fairly general level.  The first section deals with 
general security issues, as related to a registry.  Then I address 
some architectural issues, some issues about protocols, and so on.

I appologize in advance for the somewhat helter-skelter organization 
-- the document is really only half baked.  There is never enough 
time, unfortunately.  The bright side is that there are plenty of 
holes for other people to fill in...

Security is always a continuum, and a set of tradeoffs.  We can
identify several objectives or requirements we wish to minimize or
maximize: security, convenience of access, cost of implementation,
speed of implementation, and cost of maintenance.  Perfection is 
highly unlikely.  However, in this particular case we can probably do 
a pretty darn good job.

Our one absolute functional requirement is that DNS and whois data be
accessible to the net at a sufficient performance level.  On the other
hand, the registry database itself does not need to be easily available. 
This leads to the obvious conclusion that these functions can be
separated.  And indeed, while securing DNS and whois access is 
important, it is a separable problem, and I don't really address it 
here.  I concentrate on the registry database operation and its 
interactions, with the registrars and others.

Some time could be spent on prioritizing the other requirements, but 
I don't think it is worth doing unless there is serious controversy.

  1.1 outline

  The document is arranged as follows:

	1. introduction
		1.1 outline
		1.2 note on terminology
		1.3 note on distribution
	2 General security concerns
		2.1 risk assessment
		2.2 damage
		2.3 specific targets
			2.3.1 private keys
			2.3.2 registry database
			2.3.3 financial database
			2.3.4 other data
		2.4 probability of attacks
		2.5 attackers
		2.6 Types of attacks
	3. Specific architectural considerations
		3.1 general
		3.2 interaction with registrars -- transitive security breaches
		3.3 internal design of the registry
			3.3.1 isolated
			3.3.2 special internal protocol
			3.3.3 standard lan protocols
			3.3.4 fully connected
		3.4 discussion
			3.4.1 architecture
		2.4.2 performance issues
	4 more detail on protocols
		4.1  The "slow" protocol
		4.2 ftp or other access to mission-oriented files
		4.3 direct dns
		4.4 direct whois
		4.5 remote administration
		4.6 interactive access
	5 key management
		5.1 protecting keys
		5.2 generating new keys


  1.2 Note on terminology

  From the IAHC recommendations:

      A REGISTRY comprises the roles and activities involved in the
      administration of a portion of the Domain name space.  With
      respect to the work of the IAHC, a registry pertains to a single
      gTLD and encompasses all of the services needed for assignment
      and maintenance of that TLD and its registrations. 

  I use the term "registry" in the generic sense, above, as well as
  the specific sense of being restricted to a single TLD.  This is
  because we need a general term for the function.  Thus, coredb as a
  whole represents a registry, AND coredb is also composed of
  registries.  So, coredb IS the CORE registry, and the CORE registry
  is potentially composed of smaller registries.  Furthermore, CORE as
  an organization may have business data not directly tied to a
  registry database. 

  1.3 Note on distribution

  The issue of whether the CORE registry is subdivided is relevant to
  security for the following reason -- one of the basic security
  choices is between 1) putting all your eggs in one heavily guarded
  basket, or 2) spreading your eggs among different baskets.  In the
  first case a loss is rare but catastrophic; in the second case a
  loss is more frequent, but the damage is small.  The recent problems
  with .com and .net are instructive, I believe, and there is an
  interesting lesson -- none of the other couple hundred TLDs were
  affected.

  Contrasting one catastrophic failure with several small localized
  failures doesn't mean much by itself -- the sum of the small bits of
  damage could be greater than the damage from the one big failure. 
  The true advantages of distribution are actually more subtle: a
  distributed system can be resilient, instead of brittle; small
  disasters teach lessons, while catastrophes destroy; a distributed
  system can evolve, a monolith has to be rebuilt from scratch; a 
  distributed system can draw its resources from across the 
  landscape, while a centralized system concentrates in one area.


2. General security concerns

  2.1 Risk assessment

  There are various ways of thinking about risk -- the level of 
  damage that might occur, specific targets that might be hit, the 
  probability of various attacks, who the attackers might be, various 
  methods or types of attacks.  Books have been written on this 
  subject, of course.  Here are some views specific to registries.

  2.2 Damage

  The biggest risk is permanent disruption of DNS, that is, disruption
  of DNS that makes it impossible to quickly rebuild the zones. 
  Depending on the number of domains lost, this could be a first order
  catastrophe -- a large part of the net would be effectively out of
  commission for an indefinite period.  Such a failure would involve
  loss of backups, and would seem to require a large bomb to take out
  a machine and backup tapes.  [Note that bombing actually needs to be
  considered as a security issue, incidentally.  A catastrophic
  failure of DNS, wiping out half the net for months, is a potential
  terrorist goal.] 

  But a bomb is not necessary.  A malicious and extremely competent
  cracker could break in to the database machine, subvert the process
  that does backups, wait for a few months, and then crash the machine
  and wipe out all its data.  More likely, a disgruntled employee
  could do the same thing with far greater ease.  An incompetent
  employee that did backups incorrectly could also cause tremendous
  problems on the unlucky day the disks crash. 

  Short term seriously disruptive scenarios are much easier to imagine
  -- any trusted but disgruntled registrar employee could delete all
  the domains the registrar managed (that didn't have a zone key
  associated with them, that is).  And a registrar with lax security
  procedures is a threat to the whole system. 

  Loss of whois data would not be as disruptive as a loss of DNS, but
  would still be a major hit -- rebuilding the data would be very
  costly. 

  However, while these major catastrophes are possible, there are many
  minor attacks that individually require little more than watchful
  attention to deal with.  That attention is a cost in itself. 


  2.3 Specific targets

  There are three  targets of special interest -- private keys, the
  registry database, and the financial database. 

    2.3.1 Private keys

    My assumption is that the registry database will have a private
    key that it uses to sign replies and, occasionally, to decrypt
    messages.  Registrars also will have private keys, used to sign
    requests and other things.  Interestingly enough, these keys will
    in fact be low security keys -- several people will need to have
    access to them, and they will be in constant use, so they have to
    be considered low security.  They should be changed frequently. 
    Theft of a key, therefore, is not as serious as it may first
    appear.  Of course, a root compromise of a machine with keys means
    that the keys are compromised as well. 

    It will be necessary to keep databases of all revoked or expired
    public keys indefinitely, to validate old signatures.  (This
    function is generally handled by a "Certificate Authority", or
    CA).  These databases need to well-protected -- but since they are
    public, and relatively small, they can be replicated widely.  The
    most critical thing about them is maintaining integrity -- if the
    registry itself is performing the CA function, it could generate a
    file containing all expired keys, and sign it with its current
    key. 


    2.3.2 Registry database

    The registry database is of course susceptible to damage. 
    However, it goes without saying that the machine(s) it runs on
    should be reliable, there should be extensive backups and maybe
    hot spares, and it should be well protected.  It is worth noting,
    however, that the registry database is completely redundant.  It
    could, in principle, be completely rebuilt from either 1) the
    local records that registrars will keep, or 2) from DNS and whois
    data.  Such a rebuild, without special preparation would take at
    least a day, perhaps several. 

    2.3.3 Financial database

    The registry database will contain expiration times for all the
    domains, so there is no need for separate records of money owed on
    a per domain basis.  However, there needs to be a much, much
    smaller database that keeps track of what each registrar owes (or,
    conversely, how much remains in a registrars account).  This, I
    presume will be part of a small database the registry keeps on its
    registrars, which will include other stuff such as registrar keys. 
    And while this database is so small as to seem incidental, it is
    an integral part of the whole system, and will be in constant use. 
    Damage to this database would shut down the registry until it
    could be rebuilt. 

    2.3.4 Other data

    - A database of public keys of registrars, and all past public keys. 
    (Essentially Certificate Authority type data). 

    - System configuration data

    - Business data -- addresses, email, documents, etc.


  2.4 Probability of attacks

  In addition to the level of damage, the probability of an attack
  must be considered.  In general this is quite hard to do -- every
  site is different, and reliable data on frequency of attacks is very
  hard to come by.  The CORE DB, especially, is a unique case -- it
  will be an important component of the net infrastructure, in some
  ways very visible and in other ways not so visible, and has a legacy
  of bitter controversy behind it. 


  2.5 Attackers

  It is useful to consider the various categories of potential
  attackers of COREDB:

    As an identifiable critical component of the net infrastructure
    coredb is a target for any anti-technology terrorist.  In this
    case physical attacks are a real possibility. 

    Recent attacks against NSI by people apparently associated with
    the AlterNIC demonstrate quite clearly that technologically adept
    netizens with various axes to grind can cause significant
    disruption.  [There's a sentence in need of an edit! :-)]

    There is a quite large and varied population of crackers, with a
    wide variety of abilities.  Some are barely literate; some are
    bored students; some have absolutely top-notch technical skills. 
    However, the technical sophisticates have produced extremely
    powerful and easy to use tools that allow extremely dumb people to
    break in and cause serious damage.  Most large corporate nets, I
    believe, get hits from this class on a regular basis.  I expect
    "doorknob rattles", at least, from the first day coredb goes live. 

    Insiders are in many ways the most dangerous attackers of all,
    since they know every detail of the operation, and have trusted
    access.  Two things can mitigate the insider threat -- first,
    appropriate personnel policies, and second, have a means of
    reconstructing the data from sources distributed through different
    organizations. 

  A sophisticated cracker will have a variety of tools available, and
  plenty of time.  A cracker can infiltrate a large LAN over a period
  of months, take over machine after machine, install sniffers,
  trapdoors, and alarms in many places, and do nothing but use it as a
  base for further operations.  Such individuals and groups do exist;
  but a registry database is probably not a really interesting target
  for them. 


  2.6 Types of attacks

  There are a variety of them to consider: Denial of service; attacks
  against the (as yet undefined) registry protocols; attacks through
  the standard application level protocols (telnet, FTP, http, SMTP,
  rexec); lower level attacks against the IP protocol suite; social
  engineering attacks; physical attacks on equipment or people.  In
  the interests of space, time, and the readers patience, I will not
  go through these attacks in detail. 


3. Specific Architectural Considerations

  3.1 Overview

  A registry provides a network service, and so must be connected to
  the network. 

  There are many possible interfaces with the Internet.  Here are some
  appropriate for a registry:

    1) A "slow" database modification and query interface for
    registrars, with email-level propagation speeds

    2) FTP or other file transfer service for providing data -- zone
    files, whois master files, and other mission-oriented data [I use
    the term "whois" *very* loosely in this paper, by the way]

    3) A direct interface for master DNS service

    4) A direct interface for master whois service

    5) A login interface to the database for remote administration by 
    authorized administrators

    6) A "fast" interactive database interface for registrars, possibly
    limited to queries

  These 6 constitute the basic functional connectivity of a registry. 
  We also might consider:

    7) "Commercial" connectivity, for email, FTP, web, and other net 
    access to support the business needs of the operation: difficult 
    to secure.  I mention this only for completeness -- it is obvious 
    that commercial connectivity should be divorced from the 
    functional connectivity of the registry, and, unless specifically 
    mentioned, all subsequent discussion only applies to the above 6 
    categories of functional connectivity.

  These interfaces are approximately ordered in my subjective view of
  their importance to the functioning of the registry -- I consider 1
  through 4 as representing the essential functions of a registry
  (though 3 and 4 could be done remotely), and 5 and 6 as niceties.  7
  is important to the business of the registry, but not the function
  of the registry. 

  They are also roughly in order as far as insecurity is concerned --
  this basically follows from the general rule that systems with
  limited functionality are much easier to secure than general purpose
  systems. 

  Note that 1,2,5, and 6 are tightly coupled, and likely run on the
  same machine or LAN, whereas 3,4, and 7 can be run peripherally, or
  even remotely. 


  3.2 Interaction with registrars -- transitive security breaches

  Some of the functional interfaces, to a greater or lesser degree,
  extend trust to the registrars or elsewhere.  So, a security
  breach at a registrar can lead to a security breach of the registry,
  or damage to the registry data. 

  Registrars will have general Internet access, web servers, email,
  FTP, a local database with business records, employess with varied
  expertise and committment.  The registrars will be a diverse lot,
  and it probably is not feasible to enforce a strong security policy
  across all registrars.  Additionally, registrars will be juicy
  targets.  We can expect with high probability, therefore, that a
  registrars site will be cracked, with root compromises that persist
  undetected for some time.  This could potentially compromise all the
  registrars data, including secret keys.  Thus, a cracker (or a
  disgruntled employee) would have ample opportunity to damage the
  registry data. 

  I believe that the probability of such damage occurring over the
  next 5 years or so is very high -- so high that it would be
  completely irresponsible not to plan for it.  Note that this damage
  could be done through normal use of the protocols -- no further
  cracking is necessary once a registrar is compromised.  

  The implications are very important: a registry MUST be designed to
  resist damage from a completely treacherous registrar. 


  3.3 Internal design of the registry

  The central component of the registry is a database server.  The
  data sets in this server are the crown jewels of the registry, and
  must be carefully protected, backed up, and archived.  On the
  periphery are possibly email servers, DNS servers, whois servers,
  FTP servers, and other machines to handle billing and other business
  functions.  I assume a firewall or screening router is always used. 
  Here are some possible models:

    3.3.1 Isolated

    It is not strictly necessary that the database server be connected
    to the network -- a truly paranoid registry could run the server
    as a stand-alone machine, and carry diskettes full of email
    requests (received at a front end machine) and replies back and
    forth periodically.  Every night a tape could be written to
    distribute the new zone files and whois data.  Such an arrangement
    would reduce the risk of hacker attack on the data to essentially
    zero.  Highest security, low implementation cost, human attendance
    required. 

    3.3.2 Special internal protocol

    With some minimal loss of security the diskettes could be replaced
    with dedicated network links that were designed to only transfer
    requests and replies.  Such an arrangement could approach
    interactive performance, but keep the data almost totally safe
    from any kind of network attack.  However, these dedicated network
    links would have to be done carefully, and would not be a standard
    protocol, and thus would be somewhat expensive to implement.  High
    security, high development cost, moderate maintenance. 

    3.3.3 Standard LAN protocols

    In this case the front-end machine communicates with the database
    machine through a standard database-query protocol.  [The front
    end, or a separate machine, also provides the output services.]
    This is still substantially more secure than connecting the
    database machine directly to the net, because a cracker has to
    make it through the front-end undetected.  Good security, moderate
    development cost, moderate maintenance. 

    3.3.4 Fully connected

    In this scenario the database machine is functionally connected to
    the net [through the screening router or firewall].  All the
    registrar-registry protocol processing and all the preprocessing
    [round robin enforcement] takes place on the database machine. 
    Output services are also provided from the same machine.  Lowest
    security, least cost, moderate maintenance. 

  3.4 Discussion

    3.4.1 Architectural choice

    At my most paranoid, I would recommend the "Isolated" model.  From
    a performance standpoint it would give a registration turnaround
    of a few hours, and an activation turnaround of a day.  The
    inclusion of humans in the processing loop -- even just moving
    diskettes and tapes around -- is a *big* drawback, however. 

    At the other end, the "Fully connected" model is also very cheap 
    and quick to implement, and could be used as an initial 
    implementation using COTS, moving to something more secure over 
    time. 

    However, in addition to security considerations there are other,
    operational reasons to separate the database machine from a front
    end -- for one, the database machine should be a *very* high
    reliability machine (if not replicated).

    This leads to the following, obvious, architecture, which I will
    consider the cannonical architecture: a front end machine
    connected to the net through a screening router or firewall that
    handles the registry - registrar protocol.  This machine is
    connected in turn to the database server(s), via an internal LAN. 
    The database server also connects to DNS/whois/FTP servers over
    this same LAN, and these output servers are in turn connected to
    the Internet through a firewall.  There are definite security
    advantages to having a separate machine perform each function, but
    there are cost tradeoffs as well. 

    Note that the DNS/whois servers, in particular, could be sited
    remotely, and pick up their updates from an FTP site.  VPN's
    between sites could keep these updates secure -- modern firewalls
    (eg Gauntlet, by TIS) support these functions at a low cost.  In
    this case, a single front end (perhaps replicated for robustness)
    could handle the RR protocol, and provide FTP service.  Here's a
    crude diagram:


	I  |   ------        -------------           ------
	N  |---| FW |--------| Front end |-----------| DB |
	T  |   ------        | & ftp     |           ------
	E  |                 -------------
	R  |    -----------
	N  |----| TLD DNS |
	E  |    -----------
	T  |    ---------
	   |----| whois |
	        ---------


  3.4.2 Performance issues

  [Caveat -- I am not a database expert.] 

  If the database does not support direct queries the performance
  demands on the primary database are not very high at all: a million
  registrations in a year is less than 3000 database updates a day --
  a few every minute.  Given that there are currently around a million
  domain names total, any database software whatsoever could handle
  the update load.  And even if contact info is updated ten times as
  frequently as domain names are created, we are nowhere near high end
  database performance. 

  There is perhaps a bigger performance issue in the "report"
  generation phase (producing files containing whois and dns zone file
  data), or if direct queries to the database were allowed.  On the
  face of it, I see no reason at all to support direct queries.  If
  public whois/DNS data is current to within one day the vast majority
  of conflicting cases can be detected through that medium.  Without a
  mechanism that allows a registrar to lock a name the possibility of
  a close call (where you check the database, find it free, register
  the name, and get an error return because someone else just
  registered it) cannot be eliminated.  But, in practice, there is
  little difference between locking a name and registering it.  [Note:
  In earlier discussions about a protocol I believed that some kind of
  locking mechanism was necessary.  But after thinking about it in
  detail, and discussing it with several other people, I now believe
  that a name lock is superfluous.]


4. The protocols, in a little more detail

  4.1 The "slow" registry protocol

  Though an ad hoc protocol may be used, the model is email.  Requests
  are received, queued, and processed; the results are sent out via
  the same protocol.  The server end must have a daemon listening
  constantly for requests; the client host may have a server listening
  for replies, or the the client application software may maintain an
  open connection for a reply.  Authentication is primarily through
  digital signatures, with IP address/domain name validation in
  addition.

  The server side of this protocol can be very secure -- the only
  likely attack would be a denial of service (DoS) attack, through
  flooding. 

  The server should authenticate all transactions with digital
  signatures.  In addition, the set of registrar addresses/domain
  names that are allowed to send is known, and relatively small, so
  filtering on IP address/domain name (with router filtering, a
  firewall, or TCP wrappers) should be done.  That is, requests from 
  unauthorized addresses would be junked before the signature was 
  ever checked.

  However, client side security is more complex.  The client host will
  undoubtedly be connected in with the registrars database and billing
  software, and will likely have general Internet connectivity. 
  Requiring all registrars to adhere to strict security policies is
  probably not feasible politically.  So a complete crack of the
  registrars site is quite possible, which would mean that all keys
  would be compromised.  In fact, if the cracker is careful and
  knowledgeable he could register domain names for some time before
  being caught. 

  As a matter of recommended practice it would be best if registrars
  had a single host that was used to make changes to the registry
  database, and that host was used only for that purpose.  Note that
  read-only queries of the registry database need not be so
  restricted.  [But the public whois database would serve 99% of the
  time.]


  4.2 FTP or other access to mission-oriented files

  Part of the mission of a registry is to make its information
  available to the public.  One method is to make available files
  containing all the information.  The obvious implementation is FTP. 

  Making an FTP server secure is a well-understood problem. 
  Furthermore, if the ftp server is on a front end machine, the
  probability of containment of damage is pretty high. 


  4.3 Direct DNS

  As recent events have demonstrated, current implementations of DNS
  are not secure.  It will absolutely be necessary to use the latest
  implementation of BIND that supports secure DNS.  As indicated
  above, however, I believe the public DNS servers should be separated
  from the registry. 

  4.4 Direct Whois

  [To be discussed later]

  4.5 Remote Administration

  Remote administration of the database server, even through a
  relatively secure channel such as ssh, is probably not a good idea. 
  It means that you must extend trust to the machine from which the
  remote login is coming, a machine outside the firewall, and of
  uncertain security.  A compromised database administrator account
  could do tremendous damage.  The convenience is not worth the risk,
  IMO. 

  4.6 Interactive DB access

  Interactive DB access is much less of a security issue than
  administrator access, especially if access is limited to queries. 
  However, for the reasons outlined above, I think such access is an
  unnecessary luxury that will add a lot to the cost of the total
  system. 


5. Key management


  5.1 Protecting keys

  I mentioned above that the private keys in the system are low
  security keys, and I base this statement on the fact that many
  people will be using them -- every registrar employee who sends in a
  registration will use the registrars' key to sign the request, for
  example. 

  The conventional method of securing a key is by encrypting it, and
  using a human-supplied password to decrypt the key when it is used. 
  As a consequence, to use the key you need three things -- the
  encrypted key, the software needed to decrypt it (that is, the
  encryption algorithm), and the password. 

  Normally, the security of the key is preserved through the careful
  control of the password, but that control is much harder to maintain
  when multiple people use the same password.  Schemes for different
  passwords to the same object could be used, or other, more complex
  schemes, but the fundamental problem of human failure remains. 

  The situation can be mitigated somewhat by very carefully protecting
  the key itself, and arranging the software so that the employees can
  only access the key through a secure software interface. 


  5.2 Generating new keys

  Keys should be regenerated on a regular basis, and whenever a
  problem arises.  Since the registry database is central, it makes
  sense to use it as a coordinator for new key generation.  However,
  registrar keys should be generated locally -- it would be a bad 
  idea for the registry db to generate all the keys.  Therefore a 
  coordination protocol should be developed.  [This needn't be 
  complicated -- in the normal case it is completely straightforward, 
  I believe.  In the case where a key has been seriously compromised 
  things are a little more complicated, but not seriously so.]