[TECH] Numbering Schemes for Legacy Candidates

To: cve-editorial-board-list@lists.mitre.org
Subject: [TECH] Numbering Schemes for Legacy Candidates
From: "Steven M. Christey" <coley@linus.mitre.org>
Date: Thu, 5 Apr 2001 18:50:36 -0400 (EDT)
Delivery-Date: Thu Apr 5 18:50:44 2001
Sender: owner-cve-editorial-board-list@lists.mitre.org

All,

MITRE expects to be proposing legacy candidates to the Board within a
month or two.  However, as Board members discovered during an
unexpectedly lengthy discussion at the face-to-face meeting, there are
various ways that we could choose to number those legacy candidates.
Each has strengths and weaknesses.

Legacy candidates cannot be proposed until we've decided how they
should be numbered.

The meeting summary, included below, proposes 3 options:

1) Assignment-based encoding (i.e. the current approach) - use the
   year that the number was assigned, e.g. CAN-2001-XXXX

2) Publication-based encoding - use the year that the issue was first
   disclosed to the public, e.g. CAN-1997-XXXX

3) Hybrid encoding - use 1999-XXXX for all 1999 issues and earlier,
   and use publication-based encoding for later years.

There may be other options.  Slight variations are identified in the
summary.

While discussing this, we should account for:

1) Maintenance costs for CVE-compatible products/vendors

2) Possible confusion/misunderstanding by CVE users

3) Impact on CVE candidates and entries that already have numbers
   assigned to them


While discussing this, we should also remain open to the idea of
switching to an entirely different numbering scheme for candidates and
entries, as discussed at the meeting.  It will be easier to switch
schemes sooner rather than later.  (See "Rethinking the CVE naming
scheme" in the summary at
http://cve.mitre.org/board/archives/2001-03/msg00014.html)

Details are below.

- Steve



Numbering Scheme For Legacy Candidates
--------------------------------------

The current approach for numbering candidates is CAN-YYYY-NNNN, where
YYYY is the year in which the candidate number is assigned - *not* the
year in which the vulnerability is publicized.  (The same applies to
CVE entry names, in which the CAN- prefix is converted to CVE-, but
the rest of the identifier remains the same).

If applied to legacy candidates, the current approach would produce
many vulnerabilities that have CAN-2001-NNNN names, but were
publicized in 1999 and earlier.  Many people may have the perception
that the "year" slot in the CVE name is actually related to the year
in which the vulnerability is publicized.  In addition, there may be
an "aesthetic" desire to have the candidate name reflect the year of
publication, as a high-level means of managing vulnerability
information based on how old it is.

Regardless of the public's misconception of the meaning of the year in
the CVE name, many of the candidates/entries with 1999-NNNN names were
actually announced earlier than 1999.  For the most part, CVE items
with 2000-NNNN names were publicized in 2000, but some 2000-NNNN names
are for older issues.  Similarly, some problems discovered in November
and December 2000 have 2001-NNNN names.  So, many of the current
CVE/CAN names are already inconsistent with respect to the year of
publication for the associated vulnerabilities.

However, depending on the number of new vulnerabilities discovered
this year, and the number of legacy candidates that MITRE produces, it
is possible that more than 10,000 candidate names will need to be
assigned this year.  The name space only supports up to 9999 different
names per yer.  (This is being referred to as "the CAN-10K problem").
There are ways of extending the name space if necessary; for example,
by moving to a hexadecimal numbering scheme instead of decimal, 65536
different vulnerabilities could be assigned per year without changing
the number of characters in the name.  However, it is not known
whether such a change would adversely impact how some CVE-compatible
products may store the CVE names.

There are several factors that need to be considered when selecting a
naming scheme for legacy candidates:

  1) It was suggested that to help users manage CVE based on date of
     publication, that a separate field could be added which lists
     such a date.  However, this information is currently in other
     databases, and thus increases the risk of CVE's competition with
     those databases.  In addition, the date of publication would
     rarely help someone in mapping a specific vulnerability to a CVE
     name, although it is occasionally useful in discriminating
     between extremely similar vulnerabilities in the same product.
     The references and description, on the other hand, are essential
     for looking up the CVE name for a problem.

  2) If a naming scheme is adopted which encodes the year of
     publication in the name, then all the entries or candidates that
     currently exist should *NOT* be renamed just to make things
     consistent, since there is such a large number of
     CAN/CVE-1999-NNNN items that were not discovered in 1999.  The
     maintenance costs would be high for CVE users and compatible
     vendors.  Thus, even if the Board adopts a "publication-based"
     encoding, it will not apply to all CVE items.

  3) If a publication-based encoding is not used, then users will need
     to be able to distinguish between new candidates and legacy
     candidates.  This information could be provided in reports that
     are accessible on the CVE web site.

Following are the different approaches that were discussed by the
Board.

  1) Assignment-based encoding: continue to use the current approach,
     i.e. assign CAN-2001-NNNN (or CVE-2001-NNNN) to the issue, since
     2001 is the year in which the candidate is assigned.

     Pro: emphasizes that the encoding of the year in the CVE name is
     not related to the year of announcement.  Simplest to implement.

     Con: increases the possibility of misuse by consumers.  For
     example, consumers might focus on the last 200 CAN-2001-NNNN
     candidates, mistakenly believing that they are the most recent
     issues that have been discovered.

     Con: susceptible to the CAN-10K problem.

  2) Publication-based encoding: assign YYYY-NNNN, where YYYY is the
     year in which the issue was first publicized.

     Pro: more likely to address common public misconceptions.

     Pro: least susceptible to the CAN-10K problem.

     Pro: will be accurate for all issues that are publicized in 2002
     and later.

     Con: 1999-NNNN, and portions of 2000-NNNN and 2001-NNNN, will not
     be consistent with the approach.

     Con: some complications if a vulnerability is publicized one
     year, a candidate is created, and then it becomes clear that the
     problem was actually publicized in earlier years.

  3) Hybrid encoding.  If the problem was published in 1999 or
     earlier, use 1999-NNNN; otherwise, use the year of publication.

     Pro: emphasizes that the encoding of the year in the CVE name is
     not related to the year of publication, for any issues that were
     publicized in 1999 or earlier.

     Pro: For 2002 and later years - and all but the first 150
     candidates of 2001 - the year of publication will be encoded in
     the CVE name.

     Con: For 2000 and 2001, some candidate names will not include the
     year of publication.

     Con: most susceptible to the CAN-10K problem, because there may
     be more than 10,000 vulnerabilities from 1999 and earlier that
     ultimately require names.

To help end users better manage or discriminate between new and legacy
issues, and highlight the differences for purposes of user education,
it was suggested that legacy candidates be assigned names with the
highest sequence numbers available.  For example, one could start at
CAN-2001-9999, CAN-2001-9998, etc. for legacy candidates that are
created in 2001.  An alternate scheme was proposed in which legacy
issues could be assigned CAN-2001-5000 and advanced upward.  However,
it is possible that these significant gaps could cause confusion.  In
addition, some people may be using the highest-numbered candidate for
other purposes, which could have unexpected consequences.  For
example, it was noted that some people mirror individual CVE entries
and candidates on the CVE web site by requesting information up to 100
"sequence numbers" above the most recent candidate; such mirrors might
immediately attempt to download 10,000 candidates for 2001, instead of
the current number of about 250.

Regardless of which approach is adopted, users will need to be
educated about it.  It is likely that additional information will be
needed on the CVE site; in addition, CVE-compatible vendors will need
to address user misconceptions.

Also, MITRE will not be able to start proposing legacy candidates
until the numbering scheme is finalized.

Follow-Ups:
- Re: [TECH] Numbering Schemes for Legacy Candidates
  - From: Pascal Meunier <pmeunier@cerias.purdue.edu>

Prev by Date: [PROPOSAL] Cluster RECENT-58 - 28 candidates
Next by Date: Re: [TECH] Numbering Schemes for Legacy Candidates
Prev by thread: [PROPOSAL] Cluster RECENT-58 - 28 candidates
Next by thread: Re: [TECH] Numbering Schemes for Legacy Candidates
Index(es):
- Date
- Thread