FreshPorts - VuXML strategy

This page documents the strategy used when processing VuXML data.

Major Goal

The major goal is to flag records [within the Commit History for a given port] that are affected by a VuXML entry. The key to this is the ability to parse the XML provided in ports/security/vuxml/vuln.xml and then associate that information with a given commit.

Processing Strategy

Storing all of the relevant VuXML data within FreshPorts gives us the most flexibility and also divides up the process into two distinct stages:

  1. Parse the VuXML data and load it into FreshPorts
  2. Find commits that are affected by a given VuXML entry and mark it as affected.

Parsing the XML

Matthew Seaman has written a script (scripts/vuxml_parsing.pm) that provides us with the information we need.

The data within the VuXML file is parsed only for package information. We do not collect operating system information.

We have taken the initial approach that we will delete all previous VuXML information when processing a newly committed vuln.xml file. This simplifies the code. We may wish to rethink this approach at a later date.

Marking the commits - some database background

Technically speaking, we are not marking the commits. We are marking the commit history of parts that have been affected by a given VuXML entry.

FreshPorts stores each commit in the commit_log table. The ports contains a row for each port. The commit_log_ports table relates a given commit to the ports it touches. It is this data which forms the basis for the Commit History for a port.

The commit_log_ports table also records the PORTVERSION, PORTREVISION, and PORTEPOCH values of the port as a result of that commit. It is this information which will be used to test against the VuXML information.

Marking the commits - a proposed algorithm

Once the data is loaded into FreshPorts, we can obtain the names and ranges like this:

1 - for each range record
2 -     select all commit log records for the name provided
3 -     while name stays the same
4 -         test the version in thie commit log record
5 -         if affected, flag the record
6 -     end while
7 -     Add entries to commit_log_ports_vuxml table
8 - end for

Going into details

There is room for optimization here. Here are some notes based upon first impressions.

The combination of port_version, port_revision, and port_epoch will be known as the PackageVersion.

What else?

That's it. Sounds simple. Right?

Historical EPOCH

I'll tell what's up. PORTEPOCH on historical commits. Now that we have the code written that updates commit_log_ports_vuxml, we are running into a problem with 'pkg_version -t' being supplied with the wrong data, which causes the code to mark the wrong commits as being affected. This problem appears to occur only on ports that have a non-zero PORTEPOCH.

The plan

We know what ports have PORTEPOCH:

freshports.org=# select count(*) from ports where portepoch != '0';
 count
-------
   246
(1 row)

freshports.org=#

For a given port, we can see where the PORTVERSIONs have gone backwards:

freshports.org=#  select CLP.*
freshports.org-#    from commit_log_ports CLP, commit_log CL
freshports.org-#   where port_id = (select id from ports_active where name = 'scrollkeeper')
freshports.org-#     AND CLP.commit_log_id = CL.id
freshports.org-#  ORDER BY CL.commit_date desc;
 commit_log_id | port_id | needs_refresh | port_version | port_revision | port_epoch | package_name
---------------+---------+---------------+--------------+---------------+------------+--------------
        136719 |    5173 |             0 | 0.3.14       | 1             | 0          |
        134479 |    5173 |             0 | 0.3.14       | 1             | 0          |
        120086 |    5173 |             0 | 0.3.14       | 1             | 0          |
        119323 |    5173 |             0 | 0.3.14       | 1             | 0          |
        113057 |    5173 |             0 | 0.3.14       | 1             | 0          |
        111215 |    5173 |             0 | 0.3.14       | 0             | 0          |
        111840 |    5173 |             0 | 0.3.14       | 0             | 0          |
        105035 |    5173 |             0 | 0.3.14       | 0             | 0          |
        102907 |    5173 |             0 | 0.3.12       | 4             | 0          |
        101412 |    5173 |             0 | 0.3.12       | 4             | 0          |
        101380 |    5173 |             0 | 0.3.12       | 3             | 0          |
         92391 |    5173 |             0 | 0.3.12       | 2             | 0          |
         92221 |    5173 |             0 | 0.3.12       | 1             | 0          |
         83303 |    5173 |             0 | 0.3.12       | 1             | 0          |
         77278 |    5173 |             0 | 0.3.12       | 1             | 0          |
         77135 |    5173 |             0 | 0.3.12       | 1             | 0          |
         76924 |    5173 |             0 | 0.3.12       | 0             | 0          |
         76520 |    5173 |             0 | 0.3.12       | 0             | 0          |
         76228 |    5173 |             0 | 0.3.11       | 8             | 0          |
         71534 |    5173 |             0 | 0.3.11       | 8             | 0          |
         66305 |    5173 |             0 | 0.3.11       | 8             | 0          |
         65348 |    5173 |             0 | 0.3.11       | 7             | 0          |
         65013 |    5173 |             0 | 0.3.11       | 6             | 0          |
         64703 |    5173 |             0 | 0.3.11       | 6             | 0          |
         63626 |    5173 |             0 | 0.3.11       | 5             | 0          |
         52718 |    5173 |             0 | 0.3.11       | 4             | 0          |
         52705 |    5173 |             0 | 0.3.11       | 3             | 0          |
         50551 |    5173 |             0 | 0.3.11       | 2             | 0          |
         49410 |    5173 |             0 | 0.3.11       | 2             | 0          |
         48239 |    5173 |             0 | 0.3.11       | 1             | 0          |
         47310 |    5173 |             0 | 0.3.11       | 1             | 0          |
         45848 |    5173 |             0 | 0.3.11       | 1             | 0          |
         45510 |    5173 |             0 | 0.3.11       | 0             | 0          |
         44355 |    5173 |             0 | 0.3.11       | 0             | 0          |
         43795 |    5173 |             0 | 0.3.11       | 0             | 0          |
         43479 |    5173 |             0 | 0.3.11       | 0             | 0          |
         41741 |    5173 |             0 | 0.3.9        |               | 0          |
         40451 |    5173 |             0 | 0.3.9        | 0             | 0          |
         39869 |    5173 |             0 | 0.3.9        | 0             | 0          |
         38883 |    5173 |             0 | 0.3.9        | 0             | 0          |
         38822 |    5173 |             0 | 0.3.9        | 0             | 0          |
         37805 |    5173 |             0 | 0.2          | 0             | 0          |
         37804 |    5173 |             0 | 0.2          | 0             | 0          |
         36042 |    5173 |             0 | 0.2          | 0             | 0          |
         35969 |    5173 |             0 | 0.3.6        | 0             | 0          |
         15736 |    5173 |             0 |              |               | 0          |
         13030 |    5173 |             0 |              |               | 0          |
         12799 |    5173 |             0 |              |               | 0          |
         12147 |    5173 |             0 |              |               | 0          |
         12145 |    5173 |             0 |              |               | 0          |
         12060 |    5173 |             0 |              |               | 0          |
(51 rows)

freshports.org=#

The bold line indicates where the version went down from the previous commit. Commits after that one should have a PORTEPOCH != '0'. It is important to scan upwards here, not down. If we scan up, we find the first change. By comparing that PORTEPOCH to the current PORTEPOCH, we can detect if there has been more than one PORTEPOCH change.

Here is how we can track down the PORTEPOCH value:

freshports.org=# select category from ports_active where name = 'scrollkeeper';
 category
----------
 textproc
(1 row)

We now know the category.

freshports.org=# select pathname_id('ports/textproc/scrollkeeper/Makefile');
 pathname_id
-------------
       58214
(1 row)

We now know the element id for the Makefile for this port.

freshports.org=# select * from commit_log_elements where commit_log_id = 36042 and element_id = 58214;
   id   | commit_log_id | element_id | revision_name | change_type
--------+---------------+------------+---------------+-------------
 145950 |         36042 |      58214 | 1.7           | M
(1 row)

And now we know the CVS revision for the Makefile which was created by this commit.

This URL gets us that revision: FRESHPORTS_FREEBSD_CVS_URL/~checkout~/ports/textproc/scrollkeeper/Makefile?rev=1.7&content-type=text/plain

From that, we can get PORTEPOCH. If that is not equal to the current value of ports.port_epoch, we know there has been more than one change of PORTEPOCH, and we need to keep scanning. If not, we can set the commit_log_ports records accordingly. Something like this:

  update commit_log_ports set port_epoch='1' where port_id = 7366 and
   commit_log_id >= 57525;

That's a first crack at how to solve the historical PORTEPOCH issue.

We might also want to adjust only those ports which have PORTEPOCH != 0 and that have a vuln entry:

select distinct name
  from ports_active PA, commit_log_ports_vuxml CLPV
 WHERE PA.id = CLPV.port_id
   AND PA.portepoch != '0';

A better idea

What we need is a list of all the commits, and the Makefile revision that goes along with that commit. We can then fetch the Makefile, extract PortEpoch, and assign the value to the database.

A list of all the commits for port 1277:

--
-- Gives you all the commits for a port
--

    SELECT P.id,
           CLP.commit_log_id,
           CLP.port_id,
           CLP.port_version,
           CLP.port_revision,
           CLP.port_epoch,
           CL.commit_date,
           element_pathname(P.element_id)
      FROM ports               P,
           commit_log_ports    CLP,
           commit_log          CL
     WHERE P.id              = 1277
       AND P.id              = CLP.port_id
       AND CLP.port_version != ''
       AND CLP.commit_log_id = CL.id;

  id  | commit_log_id | port_id | port_version | port_revision | port_epoch |      commit_date       |       element_pathname
------+---------------+---------+--------------+---------------+------------+------------------------+-------------------------------
 1277 |         62162 |    1277 | 1.2          | 0             | 0          | 2003-01-02 13:24:08-05 | /ports/devel/mingw-bin-msvcrt
 1277 |         74202 |    1277 | 1.2          | 0             | 0          | 2003-04-05 00:09:40-05 | /ports/devel/mingw-bin-msvcrt
 1277 |         69460 |    1277 | 1.2          | 0             | 0          | 2003-02-21 06:23:04-05 | /ports/devel/mingw-bin-msvcrt
 1277 |         98320 |    1277 | 1.2          | 0             | 0          | 2003-10-15 12:51:15-04 | /ports/devel/mingw-bin-msvcrt
 1277 |        112146 |    1277 | 1.2          | 0             | 0          | 2004-01-29 02:24:56-05 | /ports/devel/mingw-bin-msvcrt
 1277 |         97881 |    1277 | 1.2          | 0             | 0          | 2003-10-13 01:44:08-04 | /ports/devel/mingw-bin-msvcrt
(6 rows)

This will give you all the commits that affect a Makefile:

--
-- Gives you all commits that touch Makefiles
--
  SELECT P.id,
         CLP.commit_log_id,
         CLP.port_id,
         CLP.port_version,
         CLP.port_revision,
         CLP.port_epoch,
         CL.commit_date,
         element_pathname(CLE.element_id),
         CLE.revision_name
    FROM ports               P, 
         commit_log_ports    CLP,
         element             E,
         commit_log_elements CLE,
         commit_log          CL
   WHERE P.id              = 1277
     AND P.id              = CLP.port_id
     AND CLE.commit_log_id = CLP.commit_log_id
     AND CLE.element_id    = E.id
     AND E.name            = 'Makefile'
     AND E.parent_id       = P.element_id
     AND CLP.commit_log_id = CL.id
     AND CLP.port_version != ''
ORDER BY CL.commit_date;

  id  | commit_log_id | port_id | port_version | port_revision | port_epoch |      commit_date       |            element_pathname            | revision_name
------+---------------+---------+--------------+---------------+------------+------------------------+----------------------------------------+---------------
 1277 |         62162 |    1277 | 1.2          | 0             | 0          | 2003-01-02 13:24:08-05 | /ports/devel/mingw-bin-msvcrt/Makefile | 1.5
 1277 |         69460 |    1277 | 1.2          | 0             | 0          | 2003-02-21 06:23:04-05 | /ports/devel/mingw-bin-msvcrt/Makefile | 1.6
 1277 |         97881 |    1277 | 1.2          | 0             | 0          | 2003-10-13 01:44:08-04 | /ports/devel/mingw-bin-msvcrt/Makefile | 1.7
 1277 |         98320 |    1277 | 1.2          | 0             | 0          | 2003-10-15 12:51:15-04 | /ports/devel/mingw-bin-msvcrt/Makefile | 1.8

What we need is a join of the two result sets:

--
-- Gives you all commits and mentions makefile revisions
--

--
-- Gives you all the commits for a port
--

    SELECT C.commit_log_id,
           C.port_id,
           C.port_version,
           C.port_revision,
           C.port_epoch,
           C.commit_date,
           C.pathname,
           M.revision_name
FROM
    (SELECT P.id,
           CLP.commit_log_id,
           CLP.port_id,
           CLP.port_version,
           CLP.port_revision,
           CLP.port_epoch,
           CL.commit_date,
           element_pathname(P.element_id) as pathname
      FROM ports               P,
           commit_log_ports    CLP,
           commit_log          CL
     WHERE P.id              = 1277
       AND P.id              = CLP.port_id
       AND CLP.port_version != ''
       AND CLP.commit_log_id = CL.id) AS C left outer join

(  SELECT P.id,
         CLP.commit_log_id,
         CLP.port_id,
         CLP.port_version,
         CLP.port_revision,
         CLP.port_epoch,
         CL.commit_date,
         element_pathname(CLE.element_id),
         CLE.revision_name
    FROM ports               P, 
         commit_log_ports    CLP,
         element             E,
         commit_log_elements CLE,
         commit_log          CL
   WHERE P.id              = 1277
     AND P.id              = CLP.port_id
     AND CLE.commit_log_id = CLP.commit_log_id
     AND CLE.element_id    = E.id
     AND E.name            = 'Makefile'
     AND E.parent_id       = P.element_id
     AND CLP.commit_log_id = CL.id
     AND CLP.port_version != '') AS M

on (M.commit_log_id = C.commit_log_id)

order by C.commit_date;

 commit_log_id | port_id | port_version | port_revision | port_epoch |      commit_date       |           pathname            | revision_name
---------------+---------+--------------+---------------+------------+------------------------+-------------------------------+---------------
         62162 |    1277 | 1.2          | 0             | 0          | 2003-01-02 13:24:08-05 | /ports/devel/mingw-bin-msvcrt | 1.5
         69460 |    1277 | 1.2          | 0             | 0          | 2003-02-21 06:23:04-05 | /ports/devel/mingw-bin-msvcrt | 1.6
         74202 |    1277 | 1.2          | 0             | 0          | 2003-04-05 00:09:40-05 | /ports/devel/mingw-bin-msvcrt |
         97881 |    1277 | 1.2          | 0             | 0          | 2003-10-13 01:44:08-04 | /ports/devel/mingw-bin-msvcrt | 1.7
         98320 |    1277 | 1.2          | 0             | 0          | 2003-10-15 12:51:15-04 | /ports/devel/mingw-bin-msvcrt | 1.8
        112146 |    1277 | 1.2          | 0             | 0          | 2004-01-29 02:24:56-05 | /ports/devel/mingw-bin-msvcrt |
(6 rows)

Now we can parse that, fetching the Makefile we need.

Points to consider:

23 September 2004 - Ports that don't set their own EPOCH

I have the simple script working now. See output at http://beta.freshports.org/tmp/epoch-fetching-slave.txt. There are issues..

That page lists the ports that have a PORTEPOCH, the commits for that port, and the historical value of the PORTEPOCH value for that commit. I do this by literally fetching each revision of the Makefile. FreshPorts knows that revision is associated with each commit (that information is in the cvs-all email0.

Obtaining the PORTEPOCH values is not a simple grep command. You must do a "make -V PORTVERSION". There are 27 ports containing an EPOCH value that are also slave ports. Of these 27, two set their own EPOCH value, the other 25 get it from the MASTERPORT. It is thoese 25 ports which are going to be tougher. There are 15 distinct master ports involved (fortunately, none of them have their own MASTERPORTs).

This query returns the master ports mentioned above:

  SELECT P.id,
         C.name || '/' || E.name as portname,
         P.master_port,
         P.portepoch
    FROM categories C, element E, ports P JOIN
(  SELECT distinct pathname_id('/ports/' || master_port) as mp_element_id
    FROM ports P, element E
   WHERE P.portepoch != '0'
     AND P.element_id = E.id
     AND P.master_port != '') MP
   ON P.element_id = mp_element_id
WHERE E.id = P.element_id
  AND P.category_id  = C.id;

Some of those ports in the sub select will have EPOCH set in their own Makefile. I found three. See http://beta.freshports.org/tmp/epoch-masters.txt.

I'm not yet sure how I'm going to cope with these master ports. The others should be straight forward. I could take the scripts/Verify/set-historical-epoch.pl and use the query above.

[1] FWIW, there are 246 ports with a PORTEPOCH value. This differs from the result of this command, perhaps because not all such ports are in the INDEX I'm using (e.g. archivers/bsdtar)

awk -F\| '$1 ~ /,/ {print $2 "/Makefile"}' /usr/ports/INDEX-

Last amended: 22 September 2004