Security Issues for Registries 1. Introduction This is a short paper on the topic of security for a shared TLD registry. Since there isn't a design in place for the CORE DB it is necessarily at a fairly general level. The first section deals with general security issues, as related to a registry. Then I address some architectural issues, some issues about protocols, and so on. I appologize in advance for the somewhat helter-skelter organization -- the document is really only half baked. There is never enough time, unfortunately. The bright side is that there are plenty of holes for other people to fill in... Security is always a continuum, and a set of tradeoffs. We can identify several objectives or requirements we wish to minimize or maximize: security, convenience of access, cost of implementation, speed of implementation, and cost of maintenance. Perfection is highly unlikely. However, in this particular case we can probably do a pretty darn good job. Our one absolute functional requirement is that DNS and whois data be accessible to the net at a sufficient performance level. On the other hand, the registry database itself does not need to be easily available. This leads to the obvious conclusion that these functions can be separated. And indeed, while securing DNS and whois access is important, it is a separable problem, and I don't really address it here. I concentrate on the registry database operation and its interactions, with the registrars and others. Some time could be spent on prioritizing the other requirements, but I don't think it is worth doing unless there is serious controversy. 1.1 outline The document is arranged as follows: 1. introduction 1.1 outline 1.2 note on terminology 1.3 note on distribution 2 General security concerns 2.1 risk assessment 2.2 damage 2.3 specific targets 2.3.1 private keys 2.3.2 registry database 2.3.3 financial database 2.3.4 other data 2.4 probability of attacks 2.5 attackers 2.6 Types of attacks 3. Specific architectural considerations 3.1 general 3.2 interaction with registrars -- transitive security breaches 3.3 internal design of the registry 3.3.1 isolated 3.3.2 special internal protocol 3.3.3 standard lan protocols 3.3.4 fully connected 3.4 discussion 3.4.1 architecture 2.4.2 performance issues 4 more detail on protocols 4.1 The "slow" protocol 4.2 ftp or other access to mission-oriented files 4.3 direct dns 4.4 direct whois 4.5 remote administration 4.6 interactive access 5 key management 5.1 protecting keys 5.2 generating new keys 1.2 Note on terminology From the IAHC recommendations: A REGISTRY comprises the roles and activities involved in the administration of a portion of the Domain name space. With respect to the work of the IAHC, a registry pertains to a single gTLD and encompasses all of the services needed for assignment and maintenance of that TLD and its registrations. I use the term "registry" in the generic sense, above, as well as the specific sense of being restricted to a single TLD. This is because we need a general term for the function. Thus, coredb as a whole represents a registry, AND coredb is also composed of registries. So, coredb IS the CORE registry, and the CORE registry is potentially composed of smaller registries. Furthermore, CORE as an organization may have business data not directly tied to a registry database. 1.3 Note on distribution The issue of whether the CORE registry is subdivided is relevant to security for the following reason -- one of the basic security choices is between 1) putting all your eggs in one heavily guarded basket, or 2) spreading your eggs among different baskets. In the first case a loss is rare but catastrophic; in the second case a loss is more frequent, but the damage is small. The recent problems with .com and .net are instructive, I believe, and there is an interesting lesson -- none of the other couple hundred TLDs were affected. Contrasting one catastrophic failure with several small localized failures doesn't mean much by itself -- the sum of the small bits of damage could be greater than the damage from the one big failure. The true advantages of distribution are actually more subtle: a distributed system can be resilient, instead of brittle; small disasters teach lessons, while catastrophes destroy; a distributed system can evolve, a monolith has to be rebuilt from scratch; a distributed system can draw its resources from across the landscape, while a centralized system concentrates in one area. 2. General security concerns 2.1 Risk assessment There are various ways of thinking about risk -- the level of damage that might occur, specific targets that might be hit, the probability of various attacks, who the attackers might be, various methods or types of attacks. Books have been written on this subject, of course. Here are some views specific to registries. 2.2 Damage The biggest risk is permanent disruption of DNS, that is, disruption of DNS that makes it impossible to quickly rebuild the zones. Depending on the number of domains lost, this could be a first order catastrophe -- a large part of the net would be effectively out of commission for an indefinite period. Such a failure would involve loss of backups, and would seem to require a large bomb to take out a machine and backup tapes. [Note that bombing actually needs to be considered as a security issue, incidentally. A catastrophic failure of DNS, wiping out half the net for months, is a potential terrorist goal.] But a bomb is not necessary. A malicious and extremely competent cracker could break in to the database machine, subvert the process that does backups, wait for a few months, and then crash the machine and wipe out all its data. More likely, a disgruntled employee could do the same thing with far greater ease. An incompetent employee that did backups incorrectly could also cause tremendous problems on the unlucky day the disks crash. Short term seriously disruptive scenarios are much easier to imagine -- any trusted but disgruntled registrar employee could delete all the domains the registrar managed (that didn't have a zone key associated with them, that is). And a registrar with lax security procedures is a threat to the whole system. Loss of whois data would not be as disruptive as a loss of DNS, but would still be a major hit -- rebuilding the data would be very costly. However, while these major catastrophes are possible, there are many minor attacks that individually require little more than watchful attention to deal with. That attention is a cost in itself. 2.3 Specific targets There are three targets of special interest -- private keys, the registry database, and the financial database. 2.3.1 Private keys My assumption is that the registry database will have a private key that it uses to sign replies and, occasionally, to decrypt messages. Registrars also will have private keys, used to sign requests and other things. Interestingly enough, these keys will in fact be low security keys -- several people will need to have access to them, and they will be in constant use, so they have to be considered low security. They should be changed frequently. Theft of a key, therefore, is not as serious as it may first appear. Of course, a root compromise of a machine with keys means that the keys are compromised as well. It will be necessary to keep databases of all revoked or expired public keys indefinitely, to validate old signatures. (This function is generally handled by a "Certificate Authority", or CA). These databases need to well-protected -- but since they are public, and relatively small, they can be replicated widely. The most critical thing about them is maintaining integrity -- if the registry itself is performing the CA function, it could generate a file containing all expired keys, and sign it with its current key. 2.3.2 Registry database The registry database is of course susceptible to damage. However, it goes without saying that the machine(s) it runs on should be reliable, there should be extensive backups and maybe hot spares, and it should be well protected. It is worth noting, however, that the registry database is completely redundant. It could, in principle, be completely rebuilt from either 1) the local records that registrars will keep, or 2) from DNS and whois data. Such a rebuild, without special preparation would take at least a day, perhaps several. 2.3.3 Financial database The registry database will contain expiration times for all the domains, so there is no need for separate records of money owed on a per domain basis. However, there needs to be a much, much smaller database that keeps track of what each registrar owes (or, conversely, how much remains in a registrars account). This, I presume will be part of a small database the registry keeps on its registrars, which will include other stuff such as registrar keys. And while this database is so small as to seem incidental, it is an integral part of the whole system, and will be in constant use. Damage to this database would shut down the registry until it could be rebuilt. 2.3.4 Other data - A database of public keys of registrars, and all past public keys. (Essentially Certificate Authority type data). - System configuration data - Business data -- addresses, email, documents, etc. 2.4 Probability of attacks In addition to the level of damage, the probability of an attack must be considered. In general this is quite hard to do -- every site is different, and reliable data on frequency of attacks is very hard to come by. The CORE DB, especially, is a unique case -- it will be an important component of the net infrastructure, in some ways very visible and in other ways not so visible, and has a legacy of bitter controversy behind it. 2.5 Attackers It is useful to consider the various categories of potential attackers of COREDB: As an identifiable critical component of the net infrastructure coredb is a target for any anti-technology terrorist. In this case physical attacks are a real possibility. Recent attacks against NSI by people apparently associated with the AlterNIC demonstrate quite clearly that technologically adept netizens with various axes to grind can cause significant disruption. [There's a sentence in need of an edit! :-)] There is a quite large and varied population of crackers, with a wide variety of abilities. Some are barely literate; some are bored students; some have absolutely top-notch technical skills. However, the technical sophisticates have produced extremely powerful and easy to use tools that allow extremely dumb people to break in and cause serious damage. Most large corporate nets, I believe, get hits from this class on a regular basis. I expect "doorknob rattles", at least, from the first day coredb goes live. Insiders are in many ways the most dangerous attackers of all, since they know every detail of the operation, and have trusted access. Two things can mitigate the insider threat -- first, appropriate personnel policies, and second, have a means of reconstructing the data from sources distributed through different organizations. A sophisticated cracker will have a variety of tools available, and plenty of time. A cracker can infiltrate a large LAN over a period of months, take over machine after machine, install sniffers, trapdoors, and alarms in many places, and do nothing but use it as a base for further operations. Such individuals and groups do exist; but a registry database is probably not a really interesting target for them. 2.6 Types of attacks There are a variety of them to consider: Denial of service; attacks against the (as yet undefined) registry protocols; attacks through the standard application level protocols (telnet, FTP, http, SMTP, rexec); lower level attacks against the IP protocol suite; social engineering attacks; physical attacks on equipment or people. In the interests of space, time, and the readers patience, I will not go through these attacks in detail. 3. Specific Architectural Considerations 3.1 Overview A registry provides a network service, and so must be connected to the network. There are many possible interfaces with the Internet. Here are some appropriate for a registry: 1) A "slow" database modification and query interface for registrars, with email-level propagation speeds 2) FTP or other file transfer service for providing data -- zone files, whois master files, and other mission-oriented data [I use the term "whois" *very* loosely in this paper, by the way] 3) A direct interface for master DNS service 4) A direct interface for master whois service 5) A login interface to the database for remote administration by authorized administrators 6) A "fast" interactive database interface for registrars, possibly limited to queries These 6 constitute the basic functional connectivity of a registry. We also might consider: 7) "Commercial" connectivity, for email, FTP, web, and other net access to support the business needs of the operation: difficult to secure. I mention this only for completeness -- it is obvious that commercial connectivity should be divorced from the functional connectivity of the registry, and, unless specifically mentioned, all subsequent discussion only applies to the above 6 categories of functional connectivity. These interfaces are approximately ordered in my subjective view of their importance to the functioning of the registry -- I consider 1 through 4 as representing the essential functions of a registry (though 3 and 4 could be done remotely), and 5 and 6 as niceties. 7 is important to the business of the registry, but not the function of the registry. They are also roughly in order as far as insecurity is concerned -- this basically follows from the general rule that systems with limited functionality are much easier to secure than general purpose systems. Note that 1,2,5, and 6 are tightly coupled, and likely run on the same machine or LAN, whereas 3,4, and 7 can be run peripherally, or even remotely. 3.2 Interaction with registrars -- transitive security breaches Some of the functional interfaces, to a greater or lesser degree, extend trust to the registrars or elsewhere. So, a security breach at a registrar can lead to a security breach of the registry, or damage to the registry data. Registrars will have general Internet access, web servers, email, FTP, a local database with business records, employess with varied expertise and committment. The registrars will be a diverse lot, and it probably is not feasible to enforce a strong security policy across all registrars. Additionally, registrars will be juicy targets. We can expect with high probability, therefore, that a registrars site will be cracked, with root compromises that persist undetected for some time. This could potentially compromise all the registrars data, including secret keys. Thus, a cracker (or a disgruntled employee) would have ample opportunity to damage the registry data. I believe that the probability of such damage occurring over the next 5 years or so is very high -- so high that it would be completely irresponsible not to plan for it. Note that this damage could be done through normal use of the protocols -- no further cracking is necessary once a registrar is compromised. The implications are very important: a registry MUST be designed to resist damage from a completely treacherous registrar. 3.3 Internal design of the registry The central component of the registry is a database server. The data sets in this server are the crown jewels of the registry, and must be carefully protected, backed up, and archived. On the periphery are possibly email servers, DNS servers, whois servers, FTP servers, and other machines to handle billing and other business functions. I assume a firewall or screening router is always used. Here are some possible models: 3.3.1 Isolated It is not strictly necessary that the database server be connected to the network -- a truly paranoid registry could run the server as a stand-alone machine, and carry diskettes full of email requests (received at a front end machine) and replies back and forth periodically. Every night a tape could be written to distribute the new zone files and whois data. Such an arrangement would reduce the risk of hacker attack on the data to essentially zero. Highest security, low implementation cost, human attendance required. 3.3.2 Special internal protocol With some minimal loss of security the diskettes could be replaced with dedicated network links that were designed to only transfer requests and replies. Such an arrangement could approach interactive performance, but keep the data almost totally safe from any kind of network attack. However, these dedicated network links would have to be done carefully, and would not be a standard protocol, and thus would be somewhat expensive to implement. High security, high development cost, moderate maintenance. 3.3.3 Standard LAN protocols In this case the front-end machine communicates with the database machine through a standard database-query protocol. [The front end, or a separate machine, also provides the output services.] This is still substantially more secure than connecting the database machine directly to the net, because a cracker has to make it through the front-end undetected. Good security, moderate development cost, moderate maintenance. 3.3.4 Fully connected In this scenario the database machine is functionally connected to the net [through the screening router or firewall]. All the registrar-registry protocol processing and all the preprocessing [round robin enforcement] takes place on the database machine. Output services are also provided from the same machine. Lowest security, least cost, moderate maintenance. 3.4 Discussion 3.4.1 Architectural choice At my most paranoid, I would recommend the "Isolated" model. From a performance standpoint it would give a registration turnaround of a few hours, and an activation turnaround of a day. The inclusion of humans in the processing loop -- even just moving diskettes and tapes around -- is a *big* drawback, however. At the other end, the "Fully connected" model is also very cheap and quick to implement, and could be used as an initial implementation using COTS, moving to something more secure over time. However, in addition to security considerations there are other, operational reasons to separate the database machine from a front end -- for one, the database machine should be a *very* high reliability machine (if not replicated). This leads to the following, obvious, architecture, which I will consider the cannonical architecture: a front end machine connected to the net through a screening router or firewall that handles the registry - registrar protocol. This machine is connected in turn to the database server(s), via an internal LAN. The database server also connects to DNS/whois/FTP servers over this same LAN, and these output servers are in turn connected to the Internet through a firewall. There are definite security advantages to having a separate machine perform each function, but there are cost tradeoffs as well. Note that the DNS/whois servers, in particular, could be sited remotely, and pick up their updates from an FTP site. VPN's between sites could keep these updates secure -- modern firewalls (eg Gauntlet, by TIS) support these functions at a low cost. In this case, a single front end (perhaps replicated for robustness) could handle the RR protocol, and provide FTP service. Here's a crude diagram: I | ------ ------------- ------ N |---| FW |--------| Front end |-----------| DB | T | ------ | & ftp | ------ E | ------------- R | ----------- N |----| TLD DNS | E | ----------- T | --------- |----| whois | --------- 3.4.2 Performance issues [Caveat -- I am not a database expert.] If the database does not support direct queries the performance demands on the primary database are not very high at all: a million registrations in a year is less than 3000 database updates a day -- a few every minute. Given that there are currently around a million domain names total, any database software whatsoever could handle the update load. And even if contact info is updated ten times as frequently as domain names are created, we are nowhere near high end database performance. There is perhaps a bigger performance issue in the "report" generation phase (producing files containing whois and dns zone file data), or if direct queries to the database were allowed. On the face of it, I see no reason at all to support direct queries. If public whois/DNS data is current to within one day the vast majority of conflicting cases can be detected through that medium. Without a mechanism that allows a registrar to lock a name the possibility of a close call (where you check the database, find it free, register the name, and get an error return because someone else just registered it) cannot be eliminated. But, in practice, there is little difference between locking a name and registering it. [Note: In earlier discussions about a protocol I believed that some kind of locking mechanism was necessary. But after thinking about it in detail, and discussing it with several other people, I now believe that a name lock is superfluous.] 4. The protocols, in a little more detail 4.1 The "slow" registry protocol Though an ad hoc protocol may be used, the model is email. Requests are received, queued, and processed; the results are sent out via the same protocol. The server end must have a daemon listening constantly for requests; the client host may have a server listening for replies, or the the client application software may maintain an open connection for a reply. Authentication is primarily through digital signatures, with IP address/domain name validation in addition. The server side of this protocol can be very secure -- the only likely attack would be a denial of service (DoS) attack, through flooding. The server should authenticate all transactions with digital signatures. In addition, the set of registrar addresses/domain names that are allowed to send is known, and relatively small, so filtering on IP address/domain name (with router filtering, a firewall, or TCP wrappers) should be done. That is, requests from unauthorized addresses would be junked before the signature was ever checked. However, client side security is more complex. The client host will undoubtedly be connected in with the registrars database and billing software, and will likely have general Internet connectivity. Requiring all registrars to adhere to strict security policies is probably not feasible politically. So a complete crack of the registrars site is quite possible, which would mean that all keys would be compromised. In fact, if the cracker is careful and knowledgeable he could register domain names for some time before being caught. As a matter of recommended practice it would be best if registrars had a single host that was used to make changes to the registry database, and that host was used only for that purpose. Note that read-only queries of the registry database need not be so restricted. [But the public whois database would serve 99% of the time.] 4.2 FTP or other access to mission-oriented files Part of the mission of a registry is to make its information available to the public. One method is to make available files containing all the information. The obvious implementation is FTP. Making an FTP server secure is a well-understood problem. Furthermore, if the ftp server is on a front end machine, the probability of containment of damage is pretty high. 4.3 Direct DNS As recent events have demonstrated, current implementations of DNS are not secure. It will absolutely be necessary to use the latest implementation of BIND that supports secure DNS. As indicated above, however, I believe the public DNS servers should be separated from the registry. 4.4 Direct Whois [To be discussed later] 4.5 Remote Administration Remote administration of the database server, even through a relatively secure channel such as ssh, is probably not a good idea. It means that you must extend trust to the machine from which the remote login is coming, a machine outside the firewall, and of uncertain security. A compromised database administrator account could do tremendous damage. The convenience is not worth the risk, IMO. 4.6 Interactive DB access Interactive DB access is much less of a security issue than administrator access, especially if access is limited to queries. However, for the reasons outlined above, I think such access is an unnecessary luxury that will add a lot to the cost of the total system. 5. Key management 5.1 Protecting keys I mentioned above that the private keys in the system are low security keys, and I base this statement on the fact that many people will be using them -- every registrar employee who sends in a registration will use the registrars' key to sign the request, for example. The conventional method of securing a key is by encrypting it, and using a human-supplied password to decrypt the key when it is used. As a consequence, to use the key you need three things -- the encrypted key, the software needed to decrypt it (that is, the encryption algorithm), and the password. Normally, the security of the key is preserved through the careful control of the password, but that control is much harder to maintain when multiple people use the same password. Schemes for different passwords to the same object could be used, or other, more complex schemes, but the fundamental problem of human failure remains. The situation can be mitigated somewhat by very carefully protecting the key itself, and arranging the software so that the employees can only access the key through a secure software interface. 5.2 Generating new keys Keys should be regenerated on a regular basis, and whenever a problem arises. Since the registry database is central, it makes sense to use it as a coordinator for new key generation. However, registrar keys should be generated locally -- it would be a bad idea for the registry db to generate all the keys. Therefore a coordination protocol should be developed. [This needn't be complicated -- in the normal case it is completely straightforward, I believe. In the case where a key has been seriously compromised things are a little more complicated, but not seriously so.]