3690 lines
		
	
	
		
			122 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			3690 lines
		
	
	
		
			122 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|   | % -*- mode: noweb; noweb-code-mode: lua-mode -*- | ||
|  | 
 | ||
|  | \documentclass{article} | ||
|  | \usepackage{fullpage} | ||
|  | \usepackage{noweb,url} | ||
|  | \usepackage[hypertex]{hyperref} | ||
|  | \noweboptions{smallcode} | ||
|  | 
 | ||
|  | \def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em | ||
|  |     T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} | ||
|  | 
 | ||
|  | \def\NbibTeX{{\rm N\kern-.05em{\sc bi\kern-.025em b}\kern-.08em | ||
|  |     T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} | ||
|  | 
 | ||
|  | \let\bibtex\BibTeX | ||
|  | \let\nbibtex\NbibTeX | ||
|  | 
 | ||
|  | \title{A Replacement for \bibtex\\ | ||
|  |    (Version <VERSION>)} | ||
|  | \author{Norman Ramsey} | ||
|  | 
 | ||
|  | \setcounter{tocdepth}{2}  %% keep TOC on one page | ||
|  | \def\lbrace{\char123} | ||
|  | \def\rbrace{\char125} | ||
|  | 
 | ||
|  | 
 | ||
|  | \begin{document} | ||
|  | @ | ||
|  | 
 | ||
|  | \maketitle | ||
|  | 
 | ||
|  | \tableofcontents | ||
|  | 
 | ||
|  | \clearpage | ||
|  | 
 | ||
|  | 
 | ||
|  | \section{Overview} | ||
|  | 
 | ||
|  | The code herein comprises the ``nbib'' package, | ||
|  | which is a collection of tools to help authors take better | ||
|  | advantage of \BibTeX\ data, especially when working in collaboration. | ||
|  | The driving technology is that instead of using \BibTeX\  | ||
|  | ``keys,'' which are chosen arbitrarily and idiosyncratically, | ||
|  | nbib builds a bibliography by searching the contents of | ||
|  | citations. | ||
|  | \begin{itemize}  | ||
|  | \item | ||
|  | \texttt{nbibtex} is a drop-in replacement for \texttt{bibtex}.  | ||
|  | Authors' \verb+\cite{+\ldots\kern-2pt \verb+}+ | ||
|  |     commands are interpreted either as classic \bibtex\ keys (for | ||
|  |     backward compatibility) or as search commands.   | ||
|  | Thus, if your | ||
|  |     bibliography contains the classic paper on type inference, \texttt{nbibtex} | ||
|  |     should find it using a citation like | ||
|  | \verb+\cite{damas-milner:1978}+, or  | ||
|  | \verb+\cite{damas-milner:polymorphism}+, or perhaps even simply | ||
|  | \verb+\cite{damas-milner}+---\emph{regardless} of the \bibtex\ key you | ||
|  | may have chosen.   | ||
|  | The same citations should also work with | ||
|  |     your coauthors' bibliographies, even if they are keyed | ||
|  |     differently.  | ||
|  | \item  | ||
|  |   \texttt{nbibfind} uses the nbib search engine on the command line.  If you | ||
|  |     know you are looking for a paper by Harper and Moggi, you can just | ||
|  |     type | ||
|  | \begin{verbatim} | ||
|  |   nbibfind harper-moggi | ||
|  | \end{verbatim} | ||
|  | and see what comes out. | ||
|  | \item | ||
|  | To help you work with coauthors who don't have the nbib package, | ||
|  |     \texttt{nbibmake}\footnote | ||
|  | {Not yet implemented.} | ||
|  |  examines a {\LaTeX} document and builds a custom  | ||
|  | \texttt{.bib} file | ||
|  |     just for that document. | ||
|  | \end{itemize} | ||
|  | 
 | ||
|  | \noindent | ||
|  | The package is written in a combination of~C and Lua: | ||
|  | \begin{itemize} | ||
|  | \item | ||
|  | Because I want nbib to be able to handle bibliographies with thousands | ||
|  | or tens of thousands of entries, | ||
|  | the code to parse a \texttt{.bib} ``database'' is written in~C. | ||
|  | A~computer bought in 2003 can parse over 15,000~entries per second. | ||
|  | \item | ||
|  | Because the search for \bibtex\ entries requires string searching on | ||
|  | every entry, | ||
|  | the string search is also written in~C (and uses Boyer-Moore). | ||
|  | \item | ||
|  | Because string manipulation is much more easily done in Lua, | ||
|  | all the code that converts a \bibtex\ entry into printed matter is | ||
|  | written in Lua, | ||
|  | as is all the ``driver'' code that implements various programs. | ||
|  | \end{itemize} | ||
|  | The net result is that \texttt{nbibtex} is about five times slower | ||
|  | than classic \texttt{bibtex}. | ||
|  | This slowdown is easy to observe when printing a bibliography | ||
|  | of several thousand entries,  | ||
|  | but on a typical paper with fewer than fifty citations and a personal | ||
|  | bibliography with a thousand entries,  | ||
|  | the pause is imperceptible. | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsection{Compatibility} | ||
|  | 
 | ||
|  | I've made every effort to make \nbibtex\ compatible with \bibtex, so | ||
|  | that \nbibtex\ can be used on existing papers and should produce | ||
|  | the same output as \bibtex. | ||
|  | Regrettably, compatibility means avoiding modern treatment | ||
|  | of non-ASCII characters, such as are found in the ISO Latin-1 | ||
|  | character set: | ||
|  | classic \bibtex\ simply treats every non-ASCII character as a letter. | ||
|  | \begin{itemize} | ||
|  | \item  | ||
|  | It would be pleasant to try instead to set \nbibtex\ to use an | ||
|  | ISO~8859-1 locale, but this leads to incompatible output:  | ||
|  | \nbibtex\ forces characters to lower case that \bibtex\ leaves alone. | ||
|  | <<pleasant code that results in incompatible output>>= | ||
|  | do | ||
|  |   local locales =  | ||
|  |     { "en_US", "en_AU", "en_CA", "en_GB", "fr_CA", "fr_CH", "fr_FR", } | ||
|  |   for _, l in pairs(locales) do | ||
|  |     if os.setlocale(l .. '.iso88591', 'ctype') then break end | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | \item  | ||
|  | A much less pleasant alternative would be to abandon the support that Lua | ||
|  | provides for distinguishing letters from nonletters and instead | ||
|  | to try to do some sort of system-dependent character classification, | ||
|  | as is done in \bibtex. | ||
|  | I~don't have the stomach for it. | ||
|  | \item | ||
|  | The most principled solution I~can imagine would be to define a | ||
|  | special ``\bibtex\ locale,'' whose sole purpose would be to guarantee | ||
|  | compatibility with \bibtex. | ||
|  | But this potential solution looks like a | ||
|  |   nightmare for software distribution. | ||
|  | \item | ||
|  | What I've done is proceed blithely with the user's current | ||
|  | locale, throwing in a hack here or there as needed to guarantee | ||
|  | compatibility with the test cases I~have in the default locale | ||
|  | I~happen to use. | ||
|  | The most notable case is [[bst.purify]], which is used to generate | ||
|  | keys for sorting. | ||
|  | \end{itemize} | ||
|  | Expedience carries the day.  Feh. | ||
|  | 
 | ||
|  | 
 | ||
|  | @ | ||
|  | 
 | ||
|  | \section{Parsing \texttt{.bib} files} | ||
|  | 
 | ||
|  | This section reads the \texttt{.bib} file(s). | ||
|  | <<nbib.c>>= | ||
|  | #include <stdio.h> | ||
|  | #include <assert.h> | ||
|  | #include <ctype.h> | ||
|  | #include <string.h> | ||
|  | #include <stdlib.h> | ||
|  | #include <stdarg.h> | ||
|  | 
 | ||
|  | #include <lua.h> | ||
|  | #include <lauxlib.h> | ||
|  | 
 | ||
|  | <<type definitions>> | ||
|  | <<function prototypes>> | ||
|  | <<initialized and uninitialized data>> | ||
|  | <<macro definitions>> | ||
|  | <<Procedures and functions for input scanning>> | ||
|  | <<function definitions>> | ||
|  | @  | ||
|  | \subsection{Internal interfaces} | ||
|  | 
 | ||
|  | \subsubsection {Data structures} | ||
|  | 
 | ||
|  | For convenience in keeping function prototypes uncluttered, | ||
|  | all state associated with reading a particular \bibtex\ file is stored | ||
|  | in a single [[Bibreader]] abstraction. | ||
|  | That state is divided into three groups: | ||
|  | \begin{itemize} | ||
|  | \item | ||
|  | Fields that say what file we are reading and what is our position | ||
|  | within that file | ||
|  | \item | ||
|  | A~buffer that holds one line of the \texttt{.bib} file currently being | ||
|  | scanned | ||
|  | \item | ||
|  | State accessible from Lua: an interpreter; | ||
|  | a list of strings from the \texttt{.bib} preamble, which is exposed to | ||
|  | the client; | ||
|  | a warning function provided by the client; | ||
|  | and a macro table provided by the client and updated by  | ||
|  | [[@string]] commands | ||
|  | \end{itemize} | ||
|  | In the buffer, | ||
|  | the meaningful characters are in the half-open interval $[{}[[buf]], | ||
|  |   [[lim]])$, | ||
|  | and we reserve space for a sentinel at~[[lim]]. | ||
|  | The invariant is that $[[buf]] \le [[cur]] < [[lim]]$ | ||
|  | and $[[buf]]+[[bufsize]] \ge [[lim]]+1$. | ||
|  | <<type definitions>>= | ||
|  | typedef struct bibreader { | ||
|  |   const char *filename;             /* name of the .bib file */ | ||
|  |   FILE *file;                       /* .bib file open for read */ | ||
|  |   int line_num;                     /* line number of the .bib file */ | ||
|  |   int entry_line;                   /* line number of last seen entry start */ | ||
|  | 
 | ||
|  |   unsigned char *buf, *cur, *lim;   /* input buffer */ | ||
|  |   unsigned bufsize;                 /* size of buffer */ | ||
|  |   char entry_close;   /* character expected to close current entry */ | ||
|  | 
 | ||
|  |   lua_State *L; | ||
|  |   int preamble;       /* reference to preamble list of strings */ | ||
|  |   int warning;        /* reference to universal warning function */ | ||
|  |   int macros;         /* reference to macro table */ | ||
|  | } *Bibreader; | ||
|  | @  | ||
|  | The [[is_id_char]] array is used to define a predicate that says | ||
|  | whether a character is considered part of an identifier. | ||
|  | <<initialized and uninitialized data>>= | ||
|  | bool is_id_char[256];    /* needs initialization */ | ||
|  | #define concat_char '#'  /* used to concatenate parts of a field defn */ | ||
|  | @  | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsubsection {Scanning} | ||
|  | Most internal functions are devoted to some form of scanning. | ||
|  | The model is a bit like Icon: scanning may succeed or fail, and it has | ||
|  | a side effect on the state of the reader---in particular the value of | ||
|  | the [[cur]] pointer, and possibly also the contents of the buffer. | ||
|  | (Unlike Icon, there is no backtracking.) | ||
|  | Success or failure is nonzero or zero but is represented using type [[bool]]. | ||
|  | <<function prototypes>>= | ||
|  | typedef int bool; | ||
|  | @ | ||
|  | Function [[getline]] refills the buffer with a new line (and updates | ||
|  | [[line_num]]), returning failure on end of file. | ||
|  | <<function prototypes>>= | ||
|  | static bool getline(Bibreader rdr); | ||
|  | @  | ||
|  | Several scanning functions come in two flavors, | ||
|  | which depend on what happends at the end of a line: | ||
|  | the [[_getline]] flavor refills the buffer and keeps scanning; | ||
|  | the normal flavor fails. | ||
|  | Here are some functions that scan for combinations of particular | ||
|  | characters, whitespace, and nonwhite characters. | ||
|  | <<function prototypes>>= | ||
|  | static bool upto1(Bibreader rdr, char c); | ||
|  | static bool upto1_getline(Bibreader rdr, char c); | ||
|  | static void upto_white_or_1(Bibreader rdr, char c); | ||
|  | static void upto_white_or_2(Bibreader rdr, char c1, char c2); | ||
|  | static void upto_white_or_3(Bibreader rdr, char c1, char c2, char c3); | ||
|  | static bool upto_nonwhite(Bibreader rdr); | ||
|  | static bool upto_nonwhite_getline(Bibreader rdr); | ||
|  | @ Because there is always whitespace at the end of a line, the | ||
|  | [[upto_white_*]] flavor cannot fail. | ||
|  | @ | ||
|  | Some more sophisticated scanning functions. | ||
|  | None attempts to return a value; instead each functions scans past the | ||
|  | token in question, which the client can then find between the old and | ||
|  | new values of the [[cur]] pointer. | ||
|  | <<function prototypes>>= | ||
|  | static bool scan_identifier (Bibreader rdr, char c1, char c2, char c3); | ||
|  | static bool scan_nonneg_integer (Bibreader rdr, unsigned *np); | ||
|  | @  | ||
|  | Continuing from low to high level, here are | ||
|  | functions used to scan fields, about which more below: | ||
|  | <<function prototypes>>= | ||
|  | static bool scan_and_buffer_a_field_token (Bibreader rdr, int key, luaL_Buffer *b); | ||
|  | static bool scan_balanced_braces(Bibreader rdr, char close, luaL_Buffer *b); | ||
|  | static bool scan_and_push_the_field_value (Bibreader rdr, int key); | ||
|  | @  | ||
|  | Two utility functions used after scanning: | ||
|  | The [[lower_case]] function overwrites buffer characters with their | ||
|  | lowercase equivalents. | ||
|  | The [[strip_leading_and_trailing_space]] functions removes leading and | ||
|  | trailing space characters from a string on top of the Lua stack. | ||
|  | <<function prototypes>>= | ||
|  | static void lower_case(unsigned char *p, unsigned char *lim); | ||
|  | static void strip_leading_and_trailing_space(lua_State *L); | ||
|  | @  | ||
|  | \subsubsection{Other functions} | ||
|  | <<function prototypes>>= | ||
|  | static int get_bib_command_or_entry_and_process(Bibreader rdr); | ||
|  | int luaopen_bibtex (lua_State *L); | ||
|  | @ | ||
|  | \subsubsection{Commands} | ||
|  | 
 | ||
|  | In addition to database entries, a \texttt{.bib} file may contain  | ||
|  | the [[comment]], [[preamble]], and [[string]] commands. | ||
|  | Each is implemented by a function of type [[Command]], which is | ||
|  | associated with the name by [[find_command]]. | ||
|  | <<function prototypes>>= | ||
|  | typedef bool (*Command)(Bibreader); | ||
|  | static Command find_command(unsigned char *p, unsigned char *lim); | ||
|  | static bool do_comment (Bibreader rdr); | ||
|  | static bool do_preamble(Bibreader rdr); | ||
|  | static bool do_string  (Bibreader rdr); | ||
|  | @  | ||
|  | 
 | ||
|  | \subsubsection{Error handling} | ||
|  | 
 | ||
|  | The [[warnv]] function is used to call the warning function supplied | ||
|  | by the Lua client. | ||
|  | In addition to the reader, it takes as arguments the number of results | ||
|  | expected and the signature of the arguments. | ||
|  | (The warning function may receive any combination of string~([[s]]),  | ||
|  | floating-point~([[f]]), and integer~([[d]]) arguments; | ||
|  | the [[fmt]] string gives the sequence of the arguments that follow.) | ||
|  | <<function prototypes>>= | ||
|  | static void warnv(Bibreader rdr, int nres, const char *fmt, ...); | ||
|  | @  | ||
|  | There's a lot of crap here to do with reporting errors. | ||
|  | An error in a function called direct from Lua | ||
|  | pushes [[false]] and a message and returns~[[2]]; | ||
|  | an error in a boolean function pushes the same but returns failure to | ||
|  | its caller. | ||
|  | I~hope to replace this code with native Lua error handling ([[lua_error]]). | ||
|  | <<macro definitions>>= | ||
|  | #define LERRPUSH(S) do { \ | ||
|  |   if (!lua_checkstack(rdr->L, 10)) assert(0); \ | ||
|  |   lua_pushboolean(rdr->L, 0); \ | ||
|  |   lua_pushfstring(rdr->L, "%s, line %d: ", rdr->filename, rdr->line_num); \ | ||
|  |   lua_pushstring(rdr->L, S); \ | ||
|  |   lua_concat(rdr->L, 2); \ | ||
|  |   } while(0) | ||
|  | #define LERRFPUSH(S,A) do { \ | ||
|  |   if (!lua_checkstack(rdr->L, 10)) assert(0); \ | ||
|  |   lua_pushboolean(rdr->L, 0); \ | ||
|  |   lua_pushfstring(rdr->L, "%s, line %d: ", rdr->filename, rdr->line_num); \ | ||
|  |   lua_pushfstring(rdr->L, S, A); \ | ||
|  |   lua_concat(rdr->L, 2); \ | ||
|  |   } while(0) | ||
|  | #define LERR(S)   do { LERRPUSH(S);   return 2; } while(0) | ||
|  | #define LERRF(S,A) do { LERRFPUSH(S,A); return 2; } while(0) | ||
|  |   /* next: cases for Boolean functions */ | ||
|  | #define LERRB(S)   do { LERRPUSH(S);   return 0; } while(0) | ||
|  | #define LERRFB(S,A) do { LERRFPUSH(S,A); return 0; } while(0) | ||
|  | @ | ||
|  | \subsection{Reading a database entry} | ||
|  | 
 | ||
|  | Syntactically, a \texttt{.bib} file is a | ||
|  | sequence of entries, perhaps with a few \texttt{.bib} commands thrown | ||
|  | in. | ||
|  | Each entry consists of an at~sign, an entry | ||
|  | type, and, between braces or parentheses and separated by commas, a | ||
|  | database key and a list of fields.  Each field consists of a field | ||
|  | name, an equals sign, and nonempty list of field tokens separated by | ||
|  | [[concat_char]]s.  Each field token is either a nonnegative number, a | ||
|  | macro name (like `jan'), or a brace-balanced string delimited by | ||
|  | either double quotes or braces.  Finally, case differences are | ||
|  | ignored for all but delimited strings and database keys, and | ||
|  | whitespace characters and ends-of-line may appear in all reasonable | ||
|  | places (i.e., anywhere except within entry types, database keys, field | ||
|  | names, and macro names); furthermore, comments may appear anywhere | ||
|  | between entries (or before the first or after the last) as long as | ||
|  | they contain no at~signs. | ||
|  | 
 | ||
|  | This function reads a database entry and pushes it on the Lua stack. | ||
|  | Any commands encountered before the database entry are executed. | ||
|  | If no entry remains, the function returns~0. | ||
|  | <<function definitions>>= | ||
|  | #undef ready_tok | ||
|  | #define ready_tok(RDR) do { \ | ||
|  |   if (!upto_nonwhite_getline(RDR)) \ | ||
|  |     LERR("Unexpected end of file"); \ | ||
|  |   } while(0) | ||
|  | 
 | ||
|  | static int get_bib_command_or_entry_and_process(Bibreader rdr) { | ||
|  |   unsigned char *id, *key; | ||
|  |   int keyindex; | ||
|  |   bool (*command)(Bibreader); | ||
|  |  getnext: | ||
|  |   <<scan [[rdr]] up to and past the next [[@]] sign and skip white space (or return 0)>> | ||
|  | 
 | ||
|  |   id = rdr->cur; | ||
|  |   if (!scan_identifier (rdr, '{', '(', '(')) | ||
|  |     LERR("Expected an entry type"); | ||
|  |   lower_case (id, rdr->cur);       /* ignore case differences */ | ||
|  |   <<if $[{}[[id]], \mbox{[[rdr->cur]]})$ points to a command, execute it and go to [[getnext]]>> | ||
|  | 
 | ||
|  |   lua_pushlstring(rdr->L, (char *) id, rdr->cur - id);    /* push entry type */ | ||
|  |   rdr->entry_line = rdr->line_num; | ||
|  |   ready_tok(rdr); | ||
|  |   <<scan past opening delimiter and set [[rdr->entry_close]]>> | ||
|  |   ready_tok(rdr); | ||
|  |   key = rdr->cur; | ||
|  |   <<set [[rdr->cur]] to next whitespace, comma, or possibly [[}]]>> | ||
|  |   lua_pushlstring(rdr->L, (char *) key, rdr->cur - key);  /* push database key */ | ||
|  |   keyindex = lua_gettop(rdr->L); | ||
|  |   lua_newtable(rdr->L);                                   /* push table of fields */ | ||
|  |   ready_tok(rdr); | ||
|  |   for (; *rdr->cur != rdr->entry_close; ) { | ||
|  |     <<absorb comma (breaking if followed by [[rdr->entry_close]])>> | ||
|  |     <<read a field-value pair and set it in the field table, which is on top of the Lua stack>> | ||
|  |     ready_tok(rdr); | ||
|  |   } | ||
|  |   rdr->cur++;  /* skip past close of entry */ | ||
|  |   return 3; /* entry type, key, table of fields */ | ||
|  | } | ||
|  | @  | ||
|  | <<scan [[rdr]] up to and past the next [[@]] sign and skip white space (or return 0)>>= | ||
|  | if (!upto1_getline(rdr, '@')) | ||
|  |   return 0;  /* no more entries; return nil */ | ||
|  | assert(*rdr->cur == '@'); | ||
|  | rdr->cur++;   /* skip the @ sign */ | ||
|  | ready_tok(rdr); | ||
|  | @  | ||
|  | <<if $[{}[[id]], \mbox{[[rdr->cur]]})$ points to a command, execute it and go to [[getnext]]>>= | ||
|  | command = find_command(id, rdr->cur); | ||
|  | if (command) { | ||
|  |   if (!command(rdr)) | ||
|  |     return 2; /* command put (false, message) on Lua stack; we're done */ | ||
|  |   goto getnext; | ||
|  | } | ||
|  | @  | ||
|  | An entry is delimited either by braces or by brackets; | ||
|  | in order to recognize the correct closing delimiter, we put it in | ||
|  | [[rdr->entry_close]].  | ||
|  | <<scan past opening delimiter and set [[rdr->entry_close]]>>= | ||
|  | if (*rdr->cur == '{')  | ||
|  |     rdr->entry_close = '}'; | ||
|  | else if (*rdr->cur == '(')  | ||
|  |     rdr->entry_close = ')'; | ||
|  | else | ||
|  |     LERR("Expected entry to open with { or ("); | ||
|  | rdr->cur++; | ||
|  | @  | ||
|  | I'm not quite sure why stopping at~[[}]] is conditional on the closing | ||
|  | delimiter in this way. | ||
|  | <<set [[rdr->cur]] to next whitespace, comma, or possibly [[}]]>>= | ||
|  | if (rdr->entry_close == '}') { | ||
|  |   upto_white_or_1(rdr, ','); | ||
|  | } else { | ||
|  |   upto_white_or_2(rdr, ',', '}'); | ||
|  | } | ||
|  | @  | ||
|  | At this point we're at a nonwhite token that is not the closing | ||
|  | delimiter. | ||
|  | If it's not a comma, there's big trouble---but even if it is, | ||
|  | the database may be using comma as a terminator, in which case a | ||
|  | closing delimiter signals the end of the entry. | ||
|  | <<absorb comma (breaking if followed by [[rdr->entry_close]])>>= | ||
|  | if (*rdr->cur == ',') { | ||
|  |   rdr->cur++; | ||
|  |   ready_tok(rdr); | ||
|  |   if (*rdr->cur == rdr->entry_close) { | ||
|  |     break; | ||
|  |   } | ||
|  | } else { | ||
|  |   LERR("Expected comma or end of entry"); | ||
|  | } | ||
|  | @  | ||
|  | The syntax for a field is \emph{identifier}\texttt{=}\emph{value}. | ||
|  | The field name is forced to lower case. | ||
|  | <<read a field-value pair and set it in the field table, which is on top of the Lua stack>>= | ||
|  | if (id = rdr->cur, !scan_identifier (rdr, '=', '=', '=')) | ||
|  |   LERR("Expected a field name"); | ||
|  | lower_case(id, rdr->cur); | ||
|  | lua_pushlstring(rdr->L, (char *) id, rdr->cur - id);  /* push field name */ | ||
|  | ready_tok(rdr); | ||
|  | if (*rdr->cur != '=') | ||
|  |   LERR("Expected '=' to follow field name"); | ||
|  | rdr->cur++;                    /* skip over the [['=']] */ | ||
|  | ready_tok(rdr); | ||
|  | if (!scan_and_push_the_field_value(rdr, keyindex)) | ||
|  |   return 2; | ||
|  | strip_leading_and_trailing_space(rdr->L); | ||
|  | <<if field is not already set, set it; otherwise warn>> | ||
|  | @  | ||
|  | Official \bibtex\ does not permit duplicate entries for a single | ||
|  | field. | ||
|  | But in entries on the net, you see lots of such duplicates in such | ||
|  | unofficial fields as \texttt{reffrom}. | ||
|  | Because classic \bibtex\ doesn't report errors on fields that aren't | ||
|  | advertised by the \texttt{.bst} file, we don't want to just blat out a | ||
|  | whole bunch of warning messages. | ||
|  | So instead we dump the problem on the warning function provided by the Lua | ||
|  | client. | ||
|  | 
 | ||
|  | We therefore can't simply set the field in the field table: | ||
|  | we first look it up, and | ||
|  | if it is nil, we set it; otherwise we warn. | ||
|  | <<if field is not already set, set it; otherwise warn>>= | ||
|  | lua_pushvalue(rdr->L, -2);  /* push key */ | ||
|  | lua_gettable(rdr->L, -4); | ||
|  | if (lua_isnil(rdr->L, -1)) { | ||
|  |   lua_pop(rdr->L, 1); | ||
|  |   lua_settable(rdr->L, -3); | ||
|  | } else { | ||
|  |   lua_pop(rdr->L, 1);  /* off comes old value  */ | ||
|  |   warnv(rdr, 0, "ssdsss", /* tag, file, line, cite-key, field, newvalue */ | ||
|  |         "extra field", rdr->filename, rdr->line_num, | ||
|  |         lua_tostring(rdr->L, keyindex), | ||
|  |         lua_tostring(rdr->L, -2), lua_tostring(rdr->L, -1)); | ||
|  |   lua_pop(rdr->L, 2);  /* off come key and new value */ | ||
|  | } | ||
|  | @  | ||
|  | \subsection{Scanning functions} | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsubsection{Scanning functions for fields} | ||
|  | @ | ||
|  | While scanning fields, we are not operating in a toplevel function, so | ||
|  | the error handling for [[ready_tok]] needs to be a bit different. | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | #undef ready_tok | ||
|  | #define ready_tok(RDR) do { \ | ||
|  |   if (!upto_nonwhite_getline(RDR)) \ | ||
|  |     LERRB("Unexpected end of file"); \ | ||
|  |   } while(0) | ||
|  | @  | ||
|  | Each field value is accumulated into a [[luaL_Buffer]] from the Lua | ||
|  | auxiliary library. | ||
|  | The buffer is always called~[[b]]; | ||
|  | for conciseness, we use the macro [[copy_char]] to add a character to | ||
|  | it.  | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | #define copy_char(C) luaL_putchar(b, (C)) | ||
|  | @  | ||
|  | A field value is a sequence of one or more tokens separated by a | ||
|  | [[concat_char]]  ([[#]]~mark). | ||
|  | A~precondition for calling [[scan_and_push_the_field_value]] is that | ||
|  | [[rdr]] is pointing at a nonwhite character. | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | static bool scan_and_push_the_field_value (Bibreader rdr, int key) { | ||
|  |   luaL_Buffer field; | ||
|  | 
 | ||
|  |   luaL_checkstack(rdr->L, 10, "Not enough Lua stack to parse bibtex database"); | ||
|  |   luaL_buffinit(rdr->L, &field); | ||
|  |   for (;;) { | ||
|  |     if (!scan_and_buffer_a_field_token(rdr, key, &field))  | ||
|  |       return 0; | ||
|  |     ready_tok(rdr); /* cur now points to [[concat_char]] or end of field */ | ||
|  |     if (*rdr->cur != concat_char) break; | ||
|  |     else { rdr->cur++; ready_tok(rdr); } | ||
|  |   } | ||
|  |   luaL_pushresult(&field); | ||
|  |   return 1; | ||
|  | } | ||
|  | @ Because [[ready_tok]] can [[return]] in case of error, we can't write | ||
|  | \begin{quote}  | ||
|  | [[for(; *rdr->cur == concat_char; rdr->cur++, ready_tok(rdr))]]. | ||
|  | \end{quote} | ||
|  | @ | ||
|  | A field token is either a nonnegative number, a macro name (like | ||
|  | `jan'), or a brace-balanced string delimited by either double quotes | ||
|  | or braces. | ||
|  | Thus there are four possibilities for the first character | ||
|  | of the field token: If it's a left brace or a double quote, the | ||
|  | token (with balanced braces, up to the matchin closing delimiter) is | ||
|  | a string; if it's a digit, the token is a number; if it's anything | ||
|  | else, the token is a macro name (and should thus have been defined by | ||
|  | either the \texttt{.bst}-file's \texttt{macro} command or the \texttt{.bib}-file's | ||
|  | \texttt{string} command).  This function returns [[false]] if there was a | ||
|  | serious syntax error. | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | static bool scan_and_buffer_a_field_token (Bibreader rdr, int key, luaL_Buffer *b) { | ||
|  |   unsigned char *p; | ||
|  |   unsigned number; | ||
|  |   *rdr->lim = ' '; | ||
|  |   switch (*rdr->cur) { | ||
|  |     case '{': case '"': | ||
|  |       return scan_balanced_braces(rdr, *rdr->cur == '{' ? '}' : '"', b); | ||
|  |     case '0': case '1': case '2': case '3': case '4': | ||
|  |     case '5': case '6': case '7': case '8': case '9':  | ||
|  |       p = rdr->cur; | ||
|  |       scan_nonneg_integer(rdr, &number); | ||
|  |       luaL_addlstring(b, (char *)p, rdr->cur - p); | ||
|  |       return 1; | ||
|  |    default: | ||
|  |       /* named macro */ | ||
|  |       p = rdr->cur; | ||
|  |       if (!scan_identifier(rdr, ',', rdr->entry_close, concat_char)) | ||
|  |         LERRB("Expected a field part"); | ||
|  |       lower_case (p, rdr->cur);   /* ignore case differences */ | ||
|  |         /* missing warning of macro name used in its own definition */ | ||
|  |       lua_pushlstring(rdr->L, (char *) p, rdr->cur - p); /* stack: name */ | ||
|  |       lua_getref(rdr->L, rdr->macros);                   /* stack: name macros */ | ||
|  |       lua_insert(rdr->L, -2);                            /* stack: name macros name */ | ||
|  |       lua_gettable(rdr->L, -2);                          /* stack: name defn */ | ||
|  |       lua_remove(rdr->L, -2);                            /* stack: defn */ | ||
|  |       <<if top of stack is nil, pop it and warn of undefined macro; else buffer it>> | ||
|  |       return 1; | ||
|  |   } | ||
|  | } | ||
|  | @  | ||
|  | Here's another warning that's kicked out to the client. | ||
|  | Reason: standard \bibtex\ complains only if it intends to use the | ||
|  | entry in question. | ||
|  | <<if top of stack is nil, pop it and warn of undefined macro; else buffer it>>= | ||
|  | { int t = lua_gettop(rdr->L); | ||
|  |   if (lua_isnil(rdr->L, -1)) { | ||
|  |     lua_pop(rdr->L, 1); | ||
|  |     lua_pushlstring(rdr->L, (char *) p, rdr->cur - p); | ||
|  |     warnv(rdr, 1, "ssdss", /* tag, file, line, key, macro */ | ||
|  |           "undefined macro", rdr->filename, rdr->line_num, | ||
|  |           key ? lua_tostring(rdr->L, key) : NULL, lua_tostring(rdr->L, -1)); | ||
|  |     if (lua_isstring(rdr->L, -1)) | ||
|  |       luaL_addvalue(b); | ||
|  |     else | ||
|  |       lua_pop(rdr->L, 1); | ||
|  |     lua_pop(rdr->L, 1); | ||
|  |   } else { | ||
|  |     luaL_addvalue(b); | ||
|  |   } | ||
|  |   assert(lua_gettop(rdr->L) == t-1); | ||
|  | } | ||
|  | @ | ||
|  | This \texttt{.bib}-specific function scans and buffers a string with | ||
|  | balanced braces, stopping just past the matching [[close]].   | ||
|  | The original \bibtex\ tries to optimize the common case of a field with | ||
|  | no internal braces; I~don't. | ||
|  | A~precondition for calling this function is that [[rdr->cur]] point at | ||
|  | the opening delimiter. | ||
|  | Whitespace is compressed to a single space character. | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | static int scan_balanced_braces(Bibreader rdr, char close, luaL_Buffer *b) { | ||
|  |   unsigned char *p, *cur, c; | ||
|  |   int braces = 0;  /* number of currently open braces *inside* string */ | ||
|  | 
 | ||
|  |   rdr->cur++;   /* scan past left delimiter */ | ||
|  |   *rdr->lim = ' '; | ||
|  |   if (isspace(*rdr->cur)) { | ||
|  |     copy_char(' '); | ||
|  |     ready_tok(rdr); | ||
|  |   } | ||
|  |   for (;;) { | ||
|  |     p = rdr->cur; | ||
|  |     upto_white_or_3(rdr, '}', '{', close); | ||
|  |     cur = rdr->cur; | ||
|  |     for ( ; p < cur; p++) /* copy nonwhite, nonbrace characters */ | ||
|  |       copy_char(*p); | ||
|  |     *rdr->lim = ' '; | ||
|  |     c = *cur;  /* will be whitespace if at end of line */ | ||
|  |     <<depending on [[c]], return or adjust [[braces]] and continue>> | ||
|  |   } | ||
|  | } | ||
|  | @  | ||
|  | Beastly complicated: | ||
|  | \begin{itemize} | ||
|  | \item | ||
|  | Space is compressed and scanned past. | ||
|  | \item | ||
|  | A closing delimiter ends the scan at brace level~0 and otherwise is | ||
|  | buffered. | ||
|  | \item | ||
|  | Braces adjust the [[braces]] count. | ||
|  | \end{itemize} | ||
|  | <<depending on [[c]], return or adjust [[braces]] and continue>>= | ||
|  | if (isspace(c)) { | ||
|  |   copy_char(' '); | ||
|  |   ready_tok(rdr); | ||
|  | } else { | ||
|  |   rdr->cur++; | ||
|  |   if (c == close) { | ||
|  |     if (braces == 0) { | ||
|  |       luaL_pushresult(b); | ||
|  |       return 1; | ||
|  |     } else { | ||
|  |       copy_char(c); | ||
|  |       if (c == '}') | ||
|  |         braces--; | ||
|  |     } | ||
|  |   } else if (c == '{') { | ||
|  |     braces++; | ||
|  |     copy_char(c); | ||
|  |   } else { | ||
|  |     assert(c == '}'); | ||
|  |     if (braces > 0) { | ||
|  |       braces--; | ||
|  |       copy_char(c); | ||
|  |     } else { | ||
|  |       luaL_pushresult(b); /* restore invariant */ | ||
|  |       LERRB("Unexpected '}'"); | ||
|  |     } | ||
|  |   } | ||
|  | } | ||
|  | @  | ||
|  | \subsubsection {Low-level scanning functions} | ||
|  | Scan the reader up to the character requested or end of line; | ||
|  | fails if not found. | ||
|  | <<function definitions>>= | ||
|  | static bool upto1(Bibreader rdr, char c) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   unsigned char *lim = rdr->lim; | ||
|  |   *lim = c; | ||
|  |   while (*p != c) | ||
|  |     p++; | ||
|  |   rdr->cur = p; | ||
|  |   return p < lim; | ||
|  | } | ||
|  | @ | ||
|  | Scan the reader up to the character requested or end of file; | ||
|  | fails if not found. | ||
|  | <<function definitions>>= | ||
|  | static int upto1_getline(Bibreader rdr, char c) { | ||
|  |   while (!upto1(rdr, c)) | ||
|  |     if (!getline(rdr)) | ||
|  |       return 0; | ||
|  |   return 1; | ||
|  | } | ||
|  | @  | ||
|  | Scan the reader up to the next whitespace or the one character requested. | ||
|  | Always succeeds, because the end of the line is whitespace. | ||
|  | <<function definitions>>= | ||
|  | static void upto_white_or_1(Bibreader rdr, char c) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   unsigned char *lim = rdr->lim; | ||
|  |   *lim = c; | ||
|  |   while (*p != c && !isspace(*p)) | ||
|  |     p++; | ||
|  |   rdr->cur = p; | ||
|  | } | ||
|  | @  | ||
|  | Scan the reader up to the next whitespace or either of two characters requested. | ||
|  | <<function definitions>>= | ||
|  | static void upto_white_or_2(Bibreader rdr, char c1, char c2) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   unsigned char *lim = rdr->lim; | ||
|  |   *lim = c1; | ||
|  |   while (*p != c1 && *p != c2 && !isspace(*p)) | ||
|  |     p++; | ||
|  |   rdr->cur = p; | ||
|  | } | ||
|  | @  | ||
|  | Scan the reader up to the next whitespace or any of three characters requested. | ||
|  | <<function definitions>>= | ||
|  | static void upto_white_or_3(Bibreader rdr, char c1, char c2, char c3) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   unsigned char *lim = rdr->lim; | ||
|  |   *lim = c1; | ||
|  |   while (!isspace(*p) && *p != c1 && *p != c2 && *p != c3) | ||
|  |     p++; | ||
|  |   rdr->cur = p; | ||
|  | } | ||
|  | @  | ||
|  | This function scans over whitespace characters, stopping either at | ||
|  | the first nonwhite character or the end of the line, respectively | ||
|  | returning [[true]] or [[false]]. | ||
|  | <<function definitions>>= | ||
|  | static bool upto_nonwhite(Bibreader rdr) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   unsigned char *lim = rdr->lim; | ||
|  |   *lim = 'x'; | ||
|  |   while (isspace(*p)) | ||
|  |     p++; | ||
|  |   rdr->cur = p; | ||
|  |   return p < lim; | ||
|  | } | ||
|  | @ | ||
|  | Scan past whitespace up to end of file if needed; | ||
|  | returns true iff nonwhite character found. | ||
|  | <<function definitions>>= | ||
|  | static int upto_nonwhite_getline(Bibreader rdr) { | ||
|  |   while (!upto_nonwhite(rdr)) | ||
|  |     if (!getline(rdr)) | ||
|  |       return 0; | ||
|  |   return 1; | ||
|  | } | ||
|  | @  | ||
|  | \subsubsection{Actual input} | ||
|  | <<function definitions>>= | ||
|  | static bool getline(Bibreader rdr) { | ||
|  |   char *result; | ||
|  |   unsigned char *buf = rdr->buf; | ||
|  |   int n; | ||
|  |   result = fgets((char *)buf, rdr->bufsize, rdr->file); | ||
|  |   if (result == NULL) | ||
|  |     return 0; | ||
|  |   rdr->line_num++; | ||
|  |   for (n = strlen((char *)buf); buf[n-1] != '\n'; n = strlen((char *)buf)) { | ||
|  |     /* failed to get whole line */ | ||
|  |     rdr->bufsize *= 2; | ||
|  |     buf = rdr->buf = realloc(rdr->buf, rdr->bufsize); | ||
|  |     assert(buf); | ||
|  |     if (fgets((char *)buf+n,rdr->bufsize-n,rdr->file)==NULL) { | ||
|  |       n = strlen((char *)buf) + 1;  /* -1 below is incorrect without newline */ | ||
|  |       break; /* file ended without a newline */ | ||
|  |     } | ||
|  |   } | ||
|  |   rdr->cur = buf; | ||
|  |   rdr->lim = buf+n-1;  /* trailing newline not in string */ | ||
|  |   return 1;  | ||
|  | } | ||
|  | @  | ||
|  | \subsubsection{Medium-level scanning functions} | ||
|  | 
 | ||
|  | This procedure scans for an identifier, stopping at the first | ||
|  | [[illegal_id_char]], or stopping at the first character if it's | ||
|  | [[numeric]].  It sets the global variable [[scan_result]] to [[id_null]] if | ||
|  | the identifier is null, else to [[white_adjacent]] if it ended at a | ||
|  | whitespace character or an end-of-line, else to | ||
|  | [[specified_char_adjacent]] if it ended at one of [[char1]] or [[char2]] or | ||
|  | [[char3]], else to [[other_char_adjacent]] if it ended at a nonspecified, | ||
|  | nonwhitespace [[illegal_id_char]].  By convention, when some calling | ||
|  | code really wants just one or two ``specified'' characters, it merely | ||
|  | repeats one of the characters. | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | static int scan_identifier (Bibreader rdr, char c1, char c2, char c3) { | ||
|  |   unsigned char *p, *orig, c; | ||
|  | 
 | ||
|  |   orig = p = rdr->cur; | ||
|  |   if (!isdigit(*p)) { | ||
|  |     /* scan until end-of-line or an [[illegal_id_char]] */ | ||
|  |     *rdr->lim = ' ';  /* an illegal id character and also white space */ | ||
|  |     while (is_id_char[*p]) | ||
|  |       p++; | ||
|  |   } | ||
|  |   c = *p; | ||
|  |   if (p > rdr->cur && (isspace(c) || c == c1 || c == c2 || c == c3)) { | ||
|  |     rdr->cur = p; | ||
|  |     return 1; | ||
|  |   } else { | ||
|  |     return 0; | ||
|  |   } | ||
|  | } | ||
|  | @ | ||
|  | This function scans for a nonnegative integer, stopping at the first | ||
|  | nondigit; it writes the resulting integer through [[np]].   | ||
|  | It returns | ||
|  | [[true]] if the token was a legal nonnegative integer (i.e., consisted | ||
|  | of one or more digits). | ||
|  | <<Procedures and functions for input scanning>>= | ||
|  | static bool scan_nonneg_integer (Bibreader rdr, unsigned *np) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   unsigned n = 0; | ||
|  |   *rdr->lim = ' ';   /* sentinel */ | ||
|  |   while (isdigit(*p)) { | ||
|  |     n = n * 10 + (*p - '0'); | ||
|  |     p++; | ||
|  |   } | ||
|  |   if (p == rdr->cur) | ||
|  |     return 0; /* no digits */ | ||
|  |   else { | ||
|  |     rdr->cur = p; | ||
|  |     *np = n; | ||
|  |     return 1; | ||
|  |   } | ||
|  | } | ||
|  | @ | ||
|  | This procedure scans for an integer, stopping at the first nondigit; | ||
|  | it sets the value of [[token_value]] accordingly.  It returns [[true]] if | ||
|  | the token was a legal integer (i.e., consisted of an optional | ||
|  | [[minus_sign]] followed by one or more digits). | ||
|  | <<unused Procedures and functions for input scanning>>= | ||
|  | static bool scan_integer (Bibreader rdr) { | ||
|  |   unsigned char *p = rdr->cur; | ||
|  |   int n = 0; | ||
|  |   int sign = 0;      /* number of characters of sign */ | ||
|  |   *rdr->lim = ' ';   /* sentinel */ | ||
|  |   if (*p == '-') { | ||
|  |     sign = 1; | ||
|  |     p++; | ||
|  |   } | ||
|  |   while (isdigit(*p)) { | ||
|  |     n = n * 10 + (*p - '0'); | ||
|  |     p++; | ||
|  |   } | ||
|  |   if (p == rdr->cur) | ||
|  |     return 0; /* no digits */ | ||
|  |   else { | ||
|  |     rdr->cur = p; | ||
|  |     return 1; | ||
|  |   } | ||
|  | } | ||
|  | @  | ||
|  | \subsection{C~utility functions} | ||
|  | @  | ||
|  | <<function definitions>>= | ||
|  | static void lower_case(unsigned char *p, unsigned char *lim) { | ||
|  |   for (; p < lim; p++) | ||
|  |     *p = tolower(*p); | ||
|  | } | ||
|  | @  | ||
|  | <<function definitions>>= | ||
|  | static void strip_leading_and_trailing_space(lua_State *L) { | ||
|  |   const char *p; | ||
|  |   int n; | ||
|  |   assert(lua_isstring(L, -1)); | ||
|  |   p = lua_tostring(L, -1); | ||
|  |   n = lua_strlen(L, -1); | ||
|  |   if (n > 0 && (isspace(*p) || isspace(p[n-1]))) { | ||
|  |     while(n > 0 && isspace(*p)) | ||
|  |       p++, n--; | ||
|  |     while(n > 0 && isspace(p[n-1])) | ||
|  |       n--; | ||
|  |     lua_pushlstring(L, p, n); | ||
|  |     lua_remove(L, -2); | ||
|  |   } | ||
|  | } | ||
|  | @  | ||
|  | \subsection{Implementations of the \bibtex\ commands} | ||
|  | 
 | ||
|  | On encountering an [[@]]\emph{identifier}, we ask if the | ||
|  | \emph{identifier} stands for a command and if so, return that command. | ||
|  | <<function definitions>>= | ||
|  | static Command find_command(unsigned char *p, unsigned char *lim) { | ||
|  |   int n = lim - p; | ||
|  |   assert(lim > p); | ||
|  | #define match(S) (!strncmp(S, (char *)p, n) && (S)[n] == '\0') | ||
|  |   switch(*p) { | ||
|  |     case 'c' : if (match("comment"))  return do_comment;  else break; | ||
|  |     case 'p' : if (match("preamble")) return do_preamble; else break; | ||
|  |     case 's' : if (match("string"))   return do_string;   else break; | ||
|  |   } | ||
|  |   return (Command)0; | ||
|  | } | ||
|  | @ | ||
|  | %% \webindexsort{database-file commands}{\quad \texttt{comment}} | ||
|  | The \texttt{comment} command is implemented for SCRIBE compatibility.  It's | ||
|  | not really needed because \BibTeX\ treats (flushes) everything not | ||
|  | within an entry as a comment anyway. | ||
|  | <<function definitions>>= | ||
|  | static bool do_comment(Bibreader rdr) { | ||
|  |   return 1; | ||
|  | } | ||
|  | @ | ||
|  | %% \webindexsort{database-file commands}{\quad \texttt{preamble}} | ||
|  | The \texttt{preamble} command lets a user have \TeX\ stuff inserted (by the | ||
|  | standard styles, at least) directly into the \texttt{.bbl} file.  It is | ||
|  | intended primarily for allowing \TeX\ macro definitions used within | ||
|  | the bibliography entries (for better sorting, for example).  One | ||
|  | \texttt{preamble} command per \texttt{.bib} file should suffice. | ||
|  | 
 | ||
|  | A \texttt{preamble} command has either braces or parentheses as outer | ||
|  | delimiters.  Inside is the preamble string, which has the same syntax | ||
|  | as a field value: a nonempty list of field tokens separated by | ||
|  | [[concat_char]]s.  There are three types of field tokens---nonnegative | ||
|  | numbers, macro names, and delimited strings. | ||
|  | 
 | ||
|  | This module does all the scanning (that's not subcontracted), but the | ||
|  | \texttt{.bib}-specific scanning function | ||
|  | [[scan_and_push_the_field_value_and_eat_white]] actually stores the | ||
|  | value. | ||
|  | <<function definitions>>= | ||
|  | static bool do_preamble(Bibreader rdr) { | ||
|  |   ready_tok(rdr); | ||
|  |   <<scan past opening delimiter and set [[rdr->entry_close]]>> | ||
|  |   ready_tok(rdr); | ||
|  |   lua_rawgeti(rdr->L, LUA_REGISTRYINDEX, rdr->preamble); | ||
|  |   lua_pushnumber(rdr->L, lua_objlen(rdr->L, -1) + 1); | ||
|  |   if (!scan_and_push_the_field_value(rdr, 0)) | ||
|  |     return 0; | ||
|  |   ready_tok(rdr); | ||
|  |   if (*rdr->cur != rdr->entry_close) | ||
|  |     LERRFB("Missing '%c' in preamble command", rdr->entry_close); | ||
|  |   rdr->cur++; | ||
|  |   lua_settable(rdr->L, -3); | ||
|  |   lua_pop(rdr->L, 1); /* remove preamble */ | ||
|  |   return 1; | ||
|  | } | ||
|  | @ | ||
|  | %% \webindexsort{database-file commands}{\quad \texttt{string}} | ||
|  | The \texttt{string} command is implemented both for SCRIBE compatibility | ||
|  | and for allowing a user: to override a \texttt{.bst}-file \texttt{macro} | ||
|  | command, to define one that the \texttt{.bst} file doesn't, or to engage in | ||
|  | good, wholesome, typing laziness. | ||
|  | 
 | ||
|  | The \texttt{string} command does mostly the same thing as the | ||
|  | \texttt{.bst}-file's \texttt{macro} command (but the syntax is different and the | ||
|  | \texttt{string} command compresses white space).  In fact, later in this | ||
|  | program, the term ``macro'' refers to either a \texttt{.bst} ``macro'' or a | ||
|  | \texttt{.bib} ``string'' (when it's clear from the context that it's not | ||
|  | a \texttt{WEB} macro). | ||
|  | 
 | ||
|  | A \texttt{string} command has either braces or parentheses as outer | ||
|  | delimiters.  Inside is the string's name (it must be a legal | ||
|  | identifier, and case differences are ignored---all upper-case letters | ||
|  | are converted to lower case), then an equals sign, and the string's | ||
|  | definition, which has the same syntax as a field value: a nonempty | ||
|  | list of field tokens separated by [[concat_char]]s.  There are three | ||
|  | types of field tokens---nonnegative numbers, macro names, and | ||
|  | delimited strings. | ||
|  | <<function definitions>>= | ||
|  | static bool do_string(Bibreader rdr) { | ||
|  |   unsigned char *id; | ||
|  |   int keyindex; | ||
|  |   ready_tok(rdr); | ||
|  |   <<scan past opening delimiter and set [[rdr->entry_close]]>> | ||
|  |   ready_tok(rdr); | ||
|  |   id = rdr->cur; | ||
|  |   if (!scan_identifier(rdr, '=', '=', '=')) | ||
|  |     LERRB("Expected a string name followed by '='"); | ||
|  |   lower_case(id, rdr->cur); | ||
|  |   lua_pushlstring(rdr->L, (char *)id, rdr->cur - id); | ||
|  |   keyindex = lua_gettop(rdr->L); | ||
|  |   ready_tok(rdr); | ||
|  |   if (*rdr->cur != '=') | ||
|  |     LERRB("Expected a string name followed by '='"); | ||
|  |   rdr->cur++; | ||
|  |   ready_tok(rdr); | ||
|  |   if (!scan_and_push_the_field_value(rdr, keyindex)) | ||
|  |     return 0; | ||
|  |   ready_tok(rdr); | ||
|  |   if (*rdr->cur != rdr->entry_close) | ||
|  |     LERRFB("Missing '%c' in macro definition", rdr->entry_close); | ||
|  |   rdr->cur++; | ||
|  |   lua_getref(rdr->L, rdr->macros); | ||
|  |   lua_insert(rdr->L, -3); | ||
|  |   lua_settable(rdr->L, -3); | ||
|  |   lua_pop(rdr->L, 1); | ||
|  |   return 1; | ||
|  | } | ||
|  | @  | ||
|  | \subsection{Interface to Lua} | ||
|  | 
 | ||
|  | First, we define Lua access to a reader. | ||
|  | <<function definitions>>=   | ||
|  | static Bibreader checkreader(lua_State *L, int index) { | ||
|  |   return luaL_checkudata(L, index, "bibtex.reader"); | ||
|  | } | ||
|  | @  | ||
|  | The reader's [[__index]] metamethod provides access to the | ||
|  | [[entry_line]] and [[preamble]] values as if they were fields of the | ||
|  | Lua table.   | ||
|  | It also provides access to the [[next]] and [[close]] methods of the | ||
|  | reader object. | ||
|  | <<function definitions>>= | ||
|  | static int reader_meta_index(lua_State *L) { | ||
|  |   Bibreader rdr = checkreader(L, 1); | ||
|  |   const char *key; | ||
|  |   if (!lua_isstring(L, 2)) | ||
|  |     return 0; | ||
|  |   key = lua_tostring(L, 2); | ||
|  |   if (!strcmp(key, "next")) | ||
|  |     lua_pushcfunction(L, next_entry); | ||
|  |   else if (!strcmp(key, "entry_line")) | ||
|  |     lua_pushnumber(L, rdr->entry_line); | ||
|  |   else if (!strcmp(key, "preamble")) | ||
|  |     lua_rawgeti(L, LUA_REGISTRYINDEX, rdr->preamble); | ||
|  |   else if (!strcmp(key, "close")) | ||
|  |     lua_pushcfunction(L, closereader); | ||
|  |   else | ||
|  |     lua_pushnil(L); | ||
|  |   return 1; | ||
|  | } | ||
|  | @  | ||
|  | Here are the functions exported in the [[bibtex]] module: | ||
|  | <<function prototypes>>= | ||
|  | static int openreader(lua_State *L); | ||
|  | static int next_entry(lua_State *L); | ||
|  | static int closereader(lua_State *L); | ||
|  | <<initialized and uninitialized data>>= | ||
|  | static const struct luaL_reg bibtexlib [] = { | ||
|  |   {"open", openreader}, | ||
|  |   {"close", closereader}, | ||
|  |   {"next", next_entry}, | ||
|  |   {NULL, NULL} | ||
|  | }; | ||
|  | @  | ||
|  | \newcommand\nt[1]{\rmfamily{\emph{#1}}} | ||
|  | \newcommand\optional[1]{\rmfamily{[}#1\rmfamily{]}} | ||
|  | 
 | ||
|  | To create a reader, we call | ||
|  | \begin{quote} | ||
|  | \texttt{openreader(\nt{filename},  | ||
|  | \optional{\nt{macro-table}, \optional{\nt{warn-function}}})} | ||
|  | \end{quote} | ||
|  | 
 | ||
|  | The warning function will be called in one of the following ways: | ||
|  | \begin{itemize} | ||
|  | \item | ||
|  | warn([["extra field"]], \emph{file}, \emph{line}, \emph{citation-key}, | ||
|  | \emph{field-name}, \emph{field-value}) | ||
|  | 
 | ||
|  | Duplicate definition of a field in a single entry. | ||
|  | \item | ||
|  | warn([["undefined macro"]], \emph{file}, \emph{line}, \emph{citation-key}, | ||
|  | \emph{macro-name}) | ||
|  | 
 | ||
|  | Use of an undefined macro. | ||
|  | \end{itemize} | ||
|  | <<function definitions>>= | ||
|  | #define INBUF 128  /* initial size of input buffer */ | ||
|  | /* filename * macro table * warning function -> reader */ | ||
|  | static int openreader(lua_State *L) { | ||
|  |   const char *filename = luaL_checkstring(L, 1); | ||
|  |   FILE *f = fopen(filename, "r"); | ||
|  |   Bibreader rdr; | ||
|  |   if (!f) { | ||
|  |     lua_pushnil(L); | ||
|  |     lua_pushfstring(L, "Could not open file '%s'", filename); | ||
|  |     return 2; | ||
|  |   } | ||
|  | 
 | ||
|  |   <<set items 2 and 3 on stack to hold macro table and optional warning function>> | ||
|  | 
 | ||
|  |   rdr = lua_newuserdata(L, sizeof(*rdr)); | ||
|  |   luaL_getmetatable(L, "bibtex.reader"); | ||
|  |   lua_setmetatable(L, -2); | ||
|  | 
 | ||
|  |   rdr->line_num = 0; | ||
|  |   rdr->buf = rdr->cur = rdr->lim = malloc(INBUF); | ||
|  |   rdr->bufsize = INBUF; | ||
|  |   rdr->file = f; | ||
|  |   rdr->filename = malloc(lua_strlen(L, 1)+1); | ||
|  |   assert(rdr->filename); | ||
|  |   strncpy((char *)rdr->filename, filename, lua_strlen(L, 1)+1); | ||
|  |   rdr->L = L; | ||
|  |   lua_newtable(L); | ||
|  |   rdr->preamble = luaL_ref(L, LUA_REGISTRYINDEX); | ||
|  |   lua_pushvalue(L, 2); | ||
|  |   rdr->macros = luaL_ref(L, LUA_REGISTRYINDEX); | ||
|  |   lua_pushvalue(L, 3); | ||
|  |   rdr->warning = luaL_ref(L, LUA_REGISTRYINDEX); | ||
|  |   return 1; | ||
|  | } | ||
|  | @  | ||
|  | <<set items 2 and 3 on stack to hold macro table and optional warning function>>= | ||
|  | if (lua_type(L, 2) == LUA_TNONE) | ||
|  |   lua_newtable(L); | ||
|  | 
 | ||
|  | if (lua_type(L, 3) == LUA_TNONE) | ||
|  |   lua_pushnil(L); | ||
|  | else if (!lua_isfunction(L, 3)) | ||
|  |   luaL_error(L, "Warning value to bibtex.open is not a function"); | ||
|  | @      | ||
|  | Reader method [[next_entry]] takes no parameters. | ||
|  | On success it returns a triple (\emph{type}, \emph{key}, | ||
|  | \emph{field-table}). | ||
|  | On error it returns (\texttt{false}, \emph{message}). | ||
|  | On end of file it returns nothing. | ||
|  | <<function definitions>>= | ||
|  | static int next_entry(lua_State *L) { | ||
|  |   Bibreader rdr = checkreader(L, 1); | ||
|  |   if (!rdr->file) | ||
|  |     luaL_error(L, "Tried to read from closed bibtex.reader"); | ||
|  |   return get_bib_command_or_entry_and_process(rdr); | ||
|  | }   | ||
|  | @  | ||
|  | Closing a reader recovers its resources; | ||
|  | the [[file]] field of a closed reader is [[NULL]]. | ||
|  | <<function definitions>>= | ||
|  | static int closereader(lua_State *L) { | ||
|  |   Bibreader rdr = checkreader(L, 1); | ||
|  |   if (!rdr->file) | ||
|  |     luaL_error(L, "Tried to close closed bibtex.reader"); | ||
|  |   fclose(rdr->file); | ||
|  |   rdr->file = NULL; | ||
|  |   free(rdr->buf); | ||
|  |   rdr->buf = rdr->cur = rdr->lim = NULL; | ||
|  |   rdr->bufsize = 0; | ||
|  |   free((void*)rdr->filename); | ||
|  |   rdr->filename = NULL; | ||
|  |   rdr->L = NULL; | ||
|  |   luaL_unref(L, LUA_REGISTRYINDEX, rdr->preamble); | ||
|  |   rdr->preamble = 0; | ||
|  |   luaL_unref(L, LUA_REGISTRYINDEX, rdr->warning); | ||
|  |   rdr->warning = 0; | ||
|  |   luaL_unref(L, LUA_REGISTRYINDEX, rdr->macros); | ||
|  |   rdr->macros = 0; | ||
|  |   return 0; | ||
|  | }   | ||
|  | @  | ||
|  | To help implement the call to the warning function, we have [[warnv]]. | ||
|  | If there is no warning function, we return the nubmer of nils specified by [[nres]].  | ||
|  | <<function definitions>>= | ||
|  | static void warnv(Bibreader rdr, int nres, const char *fmt, ...) { | ||
|  |   const char *p; | ||
|  |   va_list vl; | ||
|  |      | ||
|  |   lua_rawgeti(rdr->L, LUA_REGISTRYINDEX, rdr->warning); | ||
|  |   if (lua_isnil(rdr->L, -1)) { | ||
|  |     lua_pop(rdr->L, 1); | ||
|  |     while (nres-- > 0) | ||
|  |       lua_pushnil(rdr->L); | ||
|  |   } else { | ||
|  |     va_start(vl, fmt); | ||
|  |     for (p = fmt; *p; p++) | ||
|  |       switch (*p) { | ||
|  |         case 'f': lua_pushnumber(rdr->L, va_arg(vl, double)); break; | ||
|  |         case 'd': lua_pushnumber(rdr->L, va_arg(vl, int)); break; | ||
|  |         case 's': { | ||
|  |           const char *s = va_arg(vl, char *); | ||
|  |           if (s == NULL) lua_pushnil(rdr->L); | ||
|  |           else lua_pushstring(rdr->L, s); | ||
|  |           break; | ||
|  |         } | ||
|  |         default: luaL_error(rdr->L, "invalid parameter type %c", *p); | ||
|  |       } | ||
|  |     lua_call(rdr->L, p - fmt, nres); | ||
|  |     va_end(vl); | ||
|  |   } | ||
|  | } | ||
|  | @  | ||
|  | Here's where the library is initialized. | ||
|  | This is the only exported function in the whole file. | ||
|  | <<function definitions>>= | ||
|  | int luaopen_bibtex (lua_State *L) { | ||
|  |   luaL_newmetatable(L, "bibtex.reader"); | ||
|  |   lua_pushstring(L, "__index"); | ||
|  |   lua_pushcfunction(L, reader_meta_index);  /* pushes the index method */ | ||
|  |   lua_settable(L, -3);  /* metatable.__index = metatable */ | ||
|  | 
 | ||
|  |   luaL_register(L, "bibtex", bibtexlib); | ||
|  |   <<initialize the [[is_id_char]] table>> | ||
|  |   return 1; | ||
|  | } | ||
|  | @  | ||
|  | In an identifier, we can accept any printing character except the ones | ||
|  | listed in the [[nonids]] string. | ||
|  | <<initialize the [[is_id_char]] table>>= | ||
|  | { | ||
|  |   unsigned c; | ||
|  |   static unsigned char *nonids = (unsigned char *)"\"#%'(),={} \t\n\f"; | ||
|  |   unsigned char *p; | ||
|  | 
 | ||
|  |   for (c = 0; c <= 0377; c++) | ||
|  |     is_id_char[c] = 1; | ||
|  |   for (c = 0; c <= 037; c++) | ||
|  |     is_id_char[c] = 0; | ||
|  |   for (p = nonids; *p; p++) | ||
|  |     is_id_char[*p] = 0; | ||
|  | } | ||
|  | @  | ||
|  | \subsection{Main function for the nbib commands} | ||
|  | 
 | ||
|  | This code will is the standalone main function for all the nbib commands. | ||
|  | \nextchunklabel{c-main} | ||
|  | <<nbibtex.c>>= | ||
|  | #include <stdlib.h> | ||
|  | #include <stdio.h> | ||
|  | 
 | ||
|  | #include <lua.h> | ||
|  | #include <lualib.h> | ||
|  | #include <lauxlib.h> | ||
|  | 
 | ||
|  | extern int luaopen_bibtex(lua_State *L); | ||
|  | extern int luaopen_boyer_moore (lua_State *L); | ||
|  | 
 | ||
|  | int main (int argc, char *argv[]) { | ||
|  |   int i, rc; | ||
|  |   lua_State *L = luaL_newstate(); | ||
|  |   static const char* files[] = { SHARE "/bibtex.lua",  SHARE "/natbib.nbs" }; | ||
|  | 
 | ||
|  |   #define OPEN(N) lua_pushcfunction(L, luaopen_ ## N); lua_call(L, 0, 0) | ||
|  |   OPEN(base); OPEN(table); OPEN(io); OPEN(package); OPEN(string); OPEN(bibtex); | ||
|  |   OPEN(boyer_moore); | ||
|  | 
 | ||
|  |   for (i = 0; i < sizeof(files)/sizeof(files[0]); i++) { | ||
|  |     if (luaL_dofile(L, files[i])) { | ||
|  |       fprintf(stderr, "%s: error loading configuration file %s\n", | ||
|  |               argv[0], files[i]); | ||
|  |       exit(2); | ||
|  |     } | ||
|  |   } | ||
|  |   lua_pushstring(L, "bibtex"); | ||
|  |   lua_gettable(L, LUA_GLOBALSINDEX); | ||
|  |   lua_pushstring(L, "main"); | ||
|  |   lua_gettable(L, -2); | ||
|  |   lua_newtable(L); | ||
|  |   for (i = 0; i < argc; i++) { | ||
|  |     lua_pushnumber(L, i); | ||
|  |     lua_pushstring(L, argv[i]); | ||
|  |     lua_settable(L, -3); | ||
|  |   } | ||
|  |   rc = lua_pcall(L, 1, 0, 0); | ||
|  |   if (rc) { | ||
|  |     fprintf(stderr, "Call failed: %s\n", lua_tostring(L, -1)); | ||
|  |     lua_pop(L, 1); | ||
|  |   } | ||
|  |   lua_close(L); | ||
|  |   return rc; | ||
|  | } | ||
|  | @ | ||
|  | \section{Implementation of \texttt{nbibtex}} | ||
|  | 
 | ||
|  | From here out, everything is written in Lua (\url{http://www.lua.org}). | ||
|  | The main module is [[bibtex]], and style-file support is in the | ||
|  | submodule [[bibtex.bst]]. | ||
|  | Each has a [[doc]] submodule, which is intended as machine-readable | ||
|  | documentation.  | ||
|  | <<bibtex.lua>>= | ||
|  | <<if not already present, load the C code for the [[bibtex]] module>> | ||
|  | 
 | ||
|  | local config = config or { } --- may be defined by config process | ||
|  | 
 | ||
|  | local workaround = { | ||
|  |   badbibs = true,  --- don't look at bad .bib files that come with teTeX | ||
|  | } | ||
|  | local bst = { }  | ||
|  | bibtex.bst = bst | ||
|  | 
 | ||
|  | bibtex.doc = { } | ||
|  | bibtex.bst.doc = { } | ||
|  | 
 | ||
|  | bibtex.doc.bst = '# table of functions used to write style files' | ||
|  | @ | ||
|  | Not much code is executed during startup, so the main issue is to | ||
|  | manage declaration before use. | ||
|  | I~have a few forward declarations in  | ||
|  | [[<<declarations of internal functions>>]]; otherwise, count only on | ||
|  | ``utility'' functions being declared before ``exported'' ones. | ||
|  | <<bibtex.lua>>= | ||
|  | local find = string.find | ||
|  | <<declarations of internal functions>> | ||
|  | <<Lua utility functions>> | ||
|  | <<exported Lua functions>> | ||
|  | <<check constant values for consistency>> | ||
|  | 
 | ||
|  | return bibtex | ||
|  | @  | ||
|  | The Lua code relies on the C~code. | ||
|  | How we get the C~code depends on how | ||
|  |  \texttt{bibtex.lua} is used; there are two alternatives: | ||
|  | \begin{itemize} | ||
|  | \item | ||
|  | In the distribution, \texttt{bibtex.lua} is loaded by the C~code in | ||
|  | chunk~\subpageref{c-main}, which defines the [[bibtex]] module. | ||
|  | \item | ||
|  | For standalone testing purposes, \texttt{bibtex.lua} can be loaded | ||
|  | directly into an  | ||
|  | interactive Lua interpreter, in which case it loads the [[bibtex]] | ||
|  | module as a shared library. | ||
|  | \end{itemize} | ||
|  | <<if not already present, load the C code for the [[bibtex]] module>>= | ||
|  | if not bibtex then | ||
|  |   local nbib = require 'nbib-bibtex' | ||
|  |   bibtex = nbib | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Error handling, warning messages, and logging} | ||
|  | <<Lua utility functions>>= | ||
|  | local function printf (...) return io.stdout:write(string.format(...)) end | ||
|  | local function eprintf(...) return io.stderr:write(string.format(...)) end | ||
|  | @  | ||
|  | I have to figure out what to do about errors --- the current code is bogus. | ||
|  | Among other things, I should be setting error levels. | ||
|  | <<Lua utility functions>>= | ||
|  | local function bibwarnf (...) eprintf(...); eprintf('\n') end | ||
|  | local function biberrorf(...) eprintf(...); eprintf('\n') end | ||
|  | local function bibfatalf(...) eprintf(...); eprintf('\n'); os.exit(2) end | ||
|  | @  | ||
|  | Logging?  What logging? | ||
|  | <<Lua utility functions>>= | ||
|  | local function logf() end | ||
|  | @  | ||
|  | \subsubsection{Support for delayed warnings} | ||
|  | 
 | ||
|  | Like classic \bibtex, \nbibtex\ typically warns only about entries | ||
|  | that are actually used. | ||
|  | This functionality is implemented by function [[hold_warning]], which | ||
|  | keeps warnings on ice until they are either returned by | ||
|  | [[held_warnings]] or thrown away by [[drop_warning]]. | ||
|  | The function [[emit_warning]] emits a warning message eagerly when | ||
|  | called; | ||
|  | it is used to issue warnings about entries we actually use, or if the | ||
|  | [[-strict]] option is given, to issue every warning. | ||
|  | <<Lua utility functions>>= | ||
|  | local hold_warning  -- function suitable to pass to bibtex.open; holds | ||
|  | local emit_warning  -- function suitable to pass to bibtex.open; prints | ||
|  | local held_warnings -- returns nil or list of warnings since last call | ||
|  | local drop_warnings -- drops warnings | ||
|  | 
 | ||
|  | local extra_ok = { reffrom = true } | ||
|  |    -- set of fields about which we should not warn of duplicates | ||
|  | 
 | ||
|  | do | ||
|  |   local warnfuns = { } | ||
|  |   warnfuns["extra field"] = | ||
|  |     function(file, line, cite, field, newvalue) | ||
|  |       if not extra_ok[field] then  | ||
|  |         bibwarnf("Warning--I'm ignoring %s's extra \"%s\" field\n--line %d of file %s\n", | ||
|  |                  cite, field, line, file) | ||
|  |       end | ||
|  |     end | ||
|  | 
 | ||
|  |   warnfuns["undefined macro"] = | ||
|  |     function(file, line, cite, macro) | ||
|  |       bibwarnf("Warning--string name \"%s\" is undefined\n--line %d of file %s\n", | ||
|  |                macro, line, file) | ||
|  |     end | ||
|  | 
 | ||
|  |   function emit_warning(tag, ...) | ||
|  |     return assert(warnfuns[tag])(...) | ||
|  |   end | ||
|  | 
 | ||
|  |   local held | ||
|  |   function hold_warning(...) | ||
|  |     held = held or { } | ||
|  |     table.insert(held, { ... }) | ||
|  |   end | ||
|  |   function held_warnings() | ||
|  |     local h = held | ||
|  |     held = nil | ||
|  |     return h | ||
|  |   end | ||
|  |   function drop_warnings() | ||
|  |     held = nil | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Miscellany} | ||
|  | All this stuff is dubious. | ||
|  | <<Lua utility functions>>= | ||
|  | function table.copy(t) | ||
|  |   local u = { } | ||
|  |   for k, v in pairs(t) do u[k] = v end | ||
|  |   return u | ||
|  | end | ||
|  | @  | ||
|  | <<Lua utility functions>>= | ||
|  | local function open(f, m, what) | ||
|  |   local f, msg = io.open(f, m) | ||
|  |   if f then | ||
|  |     return f | ||
|  |   else | ||
|  |     (what or bibfatalf)('Could not open file %s: %s', f, msg) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<exported Lua functions>>= | ||
|  | local function entries(rdr, empty) | ||
|  |   assert(not empty) | ||
|  |   return function() return rdr:next() end | ||
|  | end | ||
|  | 
 | ||
|  | bibtex.entries = entries | ||
|  | bibtex.doc.entries = 'reader -> iterator   # generate entries' | ||
|  | @  | ||
|  | \subsection{Internal documentation} | ||
|  | 
 | ||
|  | We attempt to document everything! | ||
|  | <<exported Lua functions>>= | ||
|  | function bibtex:show_doc(title) | ||
|  |   local out = bst.writer(io.stdout, 5) | ||
|  |   local function outf(...) return out:write(string.format(...)) end | ||
|  |   local allkeys, dkeys = { }, { } | ||
|  |   for k, _ in pairs(self)     do table.insert(allkeys, k) end | ||
|  |   for k, _ in pairs(self.doc) do table.insert(dkeys,   k) end | ||
|  |   table.sort(allkeys) | ||
|  |   table.sort(dkeys) | ||
|  |   for i = 1, table.getn(dkeys) do | ||
|  |     outf("%s.%-12s : %s\n", title, dkeys[i], self.doc[dkeys[i]]) | ||
|  |   end | ||
|  |   local header | ||
|  |   for i = 1, table.getn(allkeys) do | ||
|  |     local k = allkeys[i] | ||
|  |     if k ~= "doc" and k ~= "show_doc" and not self.doc[k] then | ||
|  |       if not header then | ||
|  |         outf('Undocumented keys in table %s:', title) | ||
|  |         header = true | ||
|  |       end | ||
|  |       outf(' %s', k) | ||
|  |     end | ||
|  |   end | ||
|  |   if header then outf('\n') end | ||
|  | end | ||
|  | bibtex.bst.show_doc = bibtex.show_doc | ||
|  | @      | ||
|  | Here is the documentation for what's defined in C~code: | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.open  = 'filename -> reader # open a reader for a .bib file' | ||
|  | bibtex.doc.close = 'reader -> unit     # close open reader' | ||
|  | bibtex.doc.next  = 'reader -> type * key * field table # read an entry' | ||
|  | @ | ||
|  | \subsection{Main function for \texttt{nbibtex}} | ||
|  | 
 | ||
|  | Actually, the same main function does for both \texttt{nbibtex} and | ||
|  | \texttt{nbibfind}; depending on how the program is called, it | ||
|  | delegates to [[bibtex.bibtex]] or [[bibtex.run_find]]. | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.main = 'string list -> unit # main program that dispatches on argv[0]' | ||
|  | function bibtex.main(argv) | ||
|  |   if argv[1] == '-doc' then -- undocumented internal doco | ||
|  |     bibtex:show_doc('bibtex') | ||
|  |     bibtex.bst:show_doc('bst') | ||
|  |   elseif find(argv[0], 'bibfind$') then | ||
|  |     return bibtex.run_find(argv) | ||
|  |   elseif find(argv[0], 'bibtex$') then | ||
|  |     return bibtex.bibtex(argv) | ||
|  |   else | ||
|  |     error("Call me something ending in 'bibtex' or 'bibfind'; when called\n  ".. | ||
|  |           argv[0]..", I don't know what to do") | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | <<exported Lua functions>>= | ||
|  | local permissive    = false -- nbibtex extension (ignore missing .bib files, etc.) | ||
|  | local strict        = false -- complain eagerly about errors in .bib files | ||
|  | local min_crossrefs = 2     -- how many crossref's required to add an entry? | ||
|  | local output_name   = nil   -- output file if not default | ||
|  | local bib_out       = false -- output .bib format | ||
|  | 
 | ||
|  | bibtex.doc.bibtex = 'string list -> unit # main program for nbibtex' | ||
|  | function bibtex.bibtex(argv) | ||
|  |   <<set bibtex options from [[argv]]>> | ||
|  |   if table.getn(argv) < 1 then | ||
|  |     bibfatalf('Usage: %s [-permissive|-strict|...] filename[.aux] [bibfile...]', | ||
|  |               argv[0]) | ||
|  |   end | ||
|  |   local auxname = table.remove(argv, 1) | ||
|  |   local basename = string.gsub(string.gsub(auxname, '%.aux$', ''), '%.$', '') | ||
|  |   auxname = basename .. '.aux' | ||
|  |   local bblname = output_name or (basename .. '.bbl') | ||
|  |   local blgname = basename .. (output_name and '.nlg' or '.blg') | ||
|  |   local blg = open(blgname, 'w') | ||
|  | 
 | ||
|  |   -- Here's what we accumulate by reading .aux files:  | ||
|  |   local bibstyle           -- the bibliography style | ||
|  |   local bibfiles = { }     -- list of files named in order of file | ||
|  |   local citekeys = { }     -- list of citation keys from .aux | ||
|  |                            -- (in order seen, mixed case, no duplicates) | ||
|  |   local cited_star = false -- .tex contains \cite{*} or \nocite{*} | ||
|  | 
 | ||
|  |   <<using file [[auxname]], set [[bibstyle]], [[citekeys]], and [[bibfiles]]>> | ||
|  | 
 | ||
|  |   if table.getn(argv) > 0 then -- override the bibfiles listed in the .aux file | ||
|  |     bibfiles = argv | ||
|  |   end | ||
|  |   <<validate contents of [[bibstyle]], [[citekeys]], and [[bibfiles]]>> | ||
|  |   <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>> | ||
|  |   blg:close() | ||
|  | end | ||
|  | @  | ||
|  | Options are straightforward. | ||
|  | <<set bibtex options from [[argv]]>>= | ||
|  | while table.getn(argv) > 0 and find(argv[1], '^%-') do | ||
|  |   if argv[1] == '-terse' then | ||
|  |     -- do nothing | ||
|  |   elseif argv[1] == '-permissive' then | ||
|  |     permissive = true | ||
|  |   elseif argv[1] == '-strict' then | ||
|  |     strict = true | ||
|  |   elseif argv[1] == '-min-crossrefs' and find(argv[2], '^%d+$') then | ||
|  |     min_crossrefs = assert(tonumber(argv[2])) | ||
|  |     table.remove(argv, 1) | ||
|  |   elseif string.find(argv[1], '^%-min%-crossrefs=(%d+)$') then | ||
|  |     local _, _, n = string.find(argv[1], '^%-min%-crossrefs=(%d+)$') | ||
|  |     min_crossrefs = assert(tonumber(n)) | ||
|  |   elseif string.find(argv[1], '^%-min%-crossrefs') then | ||
|  |     biberrorf("Ill-formed option %s", argv[1]) | ||
|  |   elseif argv[1] == '-o' then | ||
|  |     output_name = assert(argv[2]) | ||
|  |     table.remove(argv, 1) | ||
|  |   elseif argv[1] == '-bib' then | ||
|  |     bib_out = true | ||
|  |   elseif argv[1] == '-help' then | ||
|  |     help() | ||
|  |   elseif argv[1] == '-version' then | ||
|  |     printf("nbibtex version <VERSION>\n") | ||
|  |     os.exit(0) | ||
|  |   else | ||
|  |     biberrorf('Unknown option %s', argv[1]) | ||
|  |     help(2) | ||
|  |   end | ||
|  |   table.remove(argv, 1) | ||
|  | end | ||
|  | @  | ||
|  | <<Lua utility functions>>= | ||
|  | local function help(code) | ||
|  |   printf([[ | ||
|  | Usage: nbibtex [OPTION]... AUXFILE[.aux] [BIBFILE...] | ||
|  |   Write bibliography for entries in AUXFILE to AUXFILE.bbl. | ||
|  | 
 | ||
|  | Options: | ||
|  |   -bib                   write output as BibTeX source | ||
|  |   -help                  display this help and exit | ||
|  |   -o FILE                write output to FILE (- for stdout) | ||
|  |   -min-crossrefs=NUMBER  include item after NUMBER cross-refs; default 2 | ||
|  |   -permissive            allow missing bibfiles and (some) duplicate entries | ||
|  |   -strict                complain about any ill-formed entry we see | ||
|  |   -version               output version information and exit | ||
|  | 
 | ||
|  | Home page at http://www.eecs.harvard.edu/~nr/nbibtex. | ||
|  | Email bug reports to nr@eecs.harvard.edu. | ||
|  | ]]) | ||
|  |   os.exit(code or 0) | ||
|  | end | ||
|  | @ | ||
|  | \subsection{Reading all the aux files and validating the inputs} | ||
|  | 
 | ||
|  | We pay attention to four commands: [[\@input]], [[\bibdata]],  | ||
|  | [[\bibstyle]], and [[\citation]]. | ||
|  | <<using file [[auxname]], set [[bibstyle]], [[citekeys]], and [[bibfiles]]>>= | ||
|  | do | ||
|  |   local commands = { } -- table of commands we recognize in .aux files | ||
|  |   local function do_nothing() end -- default for unrecognized commands | ||
|  |   setmetatable(commands, { __index = function() return do_nothing end }) | ||
|  |   <<functions for commands found in .aux files>> | ||
|  |   commands['@input'](auxname)  -- reads all the variables | ||
|  | end | ||
|  | @  | ||
|  | <<functions for commands found in .aux files>>= | ||
|  | do | ||
|  |   local auxopened = { }  --- map filename to true/false | ||
|  | 
 | ||
|  |   commands['@input'] = function (auxname) | ||
|  |     if not find(auxname, '%.aux$') then | ||
|  |       bibwarnf('Name of auxfile "%s" does not end in .aux\n', auxname) | ||
|  |     end | ||
|  |     <<mark [[auxname]] as opened (but fail if opened already)>> | ||
|  |     local aux = open(auxname, 'r') | ||
|  |     logf('Top-level aux file: %s\n', auxname) | ||
|  |     for line in aux:lines() do | ||
|  |       local _, _, cmd, arg = find(line, '^\\([%a%@]+)%s*{([^%}]+)}%s*$') | ||
|  |       if cmd then commands[cmd](arg) end | ||
|  |     end | ||
|  |     aux:close() | ||
|  |   end | ||
|  | end | ||
|  | <<mark [[auxname]] as opened (but fail if opened already)>>= | ||
|  | if auxopened[auxname] then | ||
|  |   error("File " .. auxname .. " cyclically \\@input's itself") | ||
|  | else | ||
|  |   auxopened[auxname] = true | ||
|  | end | ||
|  | @ | ||
|  | \bibtex\ expects \texttt{.bib} files to be separated by commas. | ||
|  | They are forced to lower case, should have no spaces in them, | ||
|  | and the [[\bibdata]] command should appear exactly once. | ||
|  | <<functions for commands found in .aux files>>= | ||
|  | do | ||
|  |   local bibdata_seen = false | ||
|  | 
 | ||
|  |   function commands.bibdata(arg) | ||
|  |     assert(not bibdata_seen, [[LaTeX provides multiple \bibdata commands]]) | ||
|  |     bibdata_seen = true | ||
|  |     for bib in string.gmatch(arg, '[^,]+') do | ||
|  |       assert(not find(bib, '%s'), 'bibname from LaTeX contains whitespace') | ||
|  |       table.insert(bibfiles, string.lower(bib)) | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | The style should be unique, and it should be known to us. | ||
|  | <<functions for commands found in .aux files>>= | ||
|  | function commands.bibstyle(stylename) | ||
|  |   if bibstyle then | ||
|  |     biberrorf('Illegal, another \\bibstyle command') | ||
|  |   else | ||
|  |     bibstyle = bibtex.style(string.lower(stylename)) | ||
|  |     if not bibstyle then | ||
|  |       bibfatalf('There is no nbibtex style called "%s"') | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @     | ||
|  | We accumulated cited keys in [[citekeys]]. | ||
|  | Keys may be duplicated, but the input should not contain two keys that | ||
|  | differ only in case. | ||
|  | <<functions for commands found in .aux files>>= | ||
|  | do | ||
|  |   local keys_seen, lower_seen = { }, { } -- which keys have been seen already | ||
|  | 
 | ||
|  |   function commands.citation(arg) | ||
|  |     for key in string.gmatch(arg, '[^,]+') do | ||
|  |       assert(not find(key, '%s'), | ||
|  |              'Citation key {' .. key .. '} from LaTeX contains whitespace') | ||
|  |       if key == '*' then | ||
|  |         cited_star = true | ||
|  |       elseif not keys_seen[key] then --- duplicates are OK | ||
|  |         keys_seen[key] = true | ||
|  |         local low = string.lower(key) | ||
|  |         <<if another key with same lowercase, complain bitterly>> | ||
|  |         if not cited_star then -- no more insertions after the star | ||
|  |           table.insert(citekeys, key) -- must be key, not low,  | ||
|  |                                       -- so that keys in .bbl match .aux | ||
|  |         end | ||
|  |       end | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<if another key with same lowercase, complain bitterly>>= | ||
|  | if lower_seen[low] then | ||
|  |   biberrorf("Citation key '%s' inconsistent with earlier key '%s'", | ||
|  |             key, lower_seen[low]) | ||
|  | else | ||
|  |   lower_seen[low] = key | ||
|  | end | ||
|  | @ | ||
|  | After reading the variables, we do a little validation. | ||
|  | I~can't seem to make up my mind what should be done incrementally | ||
|  | while things are being read. | ||
|  | <<validate contents of [[bibstyle]], [[citekeys]], and [[bibfiles]]>>= | ||
|  | if not bibstyle then | ||
|  |   bibfatalf('No \\bibliographystyle in original LaTeX') | ||
|  | end | ||
|  | 
 | ||
|  | if table.getn(bibfiles) == 0 then | ||
|  |   bibfatalf('No .bib files specified --- no \\bibliography in original LaTeX?') | ||
|  | end | ||
|  | 
 | ||
|  | if table.getn(citekeys) == 0 and not cited_star then | ||
|  |   biberrorf('No citations in document --- empty bibliography') | ||
|  | end | ||
|  | 
 | ||
|  | do --- check for duplicate bib entries | ||
|  |   local i = 1 | ||
|  |   local seen = { } | ||
|  |   while i <= table.getn(bibfiles) do | ||
|  |     local bib = bibfiles[i] | ||
|  |     if seen[bib] then | ||
|  |       bibwarnf('Multiple references to bibfile "%s"', bib) | ||
|  |       table.remove(bibfiles, i) | ||
|  |     else | ||
|  |       i = i + 1 | ||
|  |     end | ||
|  |   end | ||
|  | end     | ||
|  | @ | ||
|  | \subsection{Reading the entries from all the \bibtex\ files} | ||
|  | 
 | ||
|  | These are diagnostics that might be written to a log. | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | logf("bibstyle == %q\n", bibstyle.name) | ||
|  | logf("consult these bibfiles:") | ||
|  | for _, bib in ipairs(bibfiles) do logf(" %s", bib) end | ||
|  | logf("\ncite these papers:\n") | ||
|  | for _, key in ipairs(citekeys) do logf("  %s\n", key) end | ||
|  | if cited_star then logf("  and everything else in the database\n") end | ||
|  | @ | ||
|  | @  | ||
|  | Each bibliography file is opened with [[openbib]]. | ||
|  | Unlike classic \bibtex, we can't simply select the first entry | ||
|  | matching a citation key. | ||
|  | Instead, we read all entries into [[bibentries]] and do searches later. | ||
|  | 
 | ||
|  | The easy case is when we're not permissive: we put all the entries | ||
|  | into one list, just as if they had come from a single \texttt{.bib} file. | ||
|  | But if we're permissive, duplicates in different bibfiles are OK: we | ||
|  | will search one bibfile after another and stop after the first | ||
|  | successful search---thus instead of a single list, we have a list of | ||
|  | lists.  | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | local bibentries = { } -- if permissive, list of lists, else list of entries | ||
|  | local dupcheck = { } -- maps lower key to entry | ||
|  | local preamble = { } -- accumulates preambles from all .bib files | ||
|  | local got_one_bib = false -- did we open even one .bib file? | ||
|  | <<definition of function [[openbib]], which sets [[get_one_bib]] if successful>> | ||
|  | 
 | ||
|  | local warnings = { }  -- table of held warnings for each entry | ||
|  | local macros = bibstyle.macros() -- must accumulate macros across .bib files | ||
|  | for _, bib in ipairs(bibfiles) do | ||
|  |   local bibfilename, rdr = openbib(bib, macros) | ||
|  |   if rdr then | ||
|  |     local t -- list that will receive entries from this reader | ||
|  |     if permissive then | ||
|  |       t = { } | ||
|  |       table.insert(bibentries, t) | ||
|  |     else | ||
|  |       t = bibentries | ||
|  |     end | ||
|  |     local localdupcheck = { } -- lower key to entry; finds duplicates within this file | ||
|  |     for type, key, fields, file, line in entries(rdr) do | ||
|  |       if type == nil then | ||
|  |         break | ||
|  |       elseif type then -- got something without error | ||
|  |         local e = { type = type, key = key, fields = fields, | ||
|  |                     file = bibfilename, line = rdr.entry_line } | ||
|  |         warnings[e] = held_warnings() | ||
|  |         <<definition of local function [[not_dup]]>> | ||
|  |         local ok1, ok2 = not_dup(localdupcheck), not_dup(dupcheck) -- evaluate both | ||
|  |         if ok1 and ok2 then  | ||
|  |           table.insert(t, e) | ||
|  |         end | ||
|  |       end | ||
|  |     end | ||
|  |     for _, l in ipairs(rdr.preamble) do table.insert(preamble, l) end | ||
|  |     rdr:close() | ||
|  |   end | ||
|  | end | ||
|  | 
 | ||
|  | if not got_one_bib then | ||
|  |   bibfatalf("Could not open any of the following .bib files: %s", | ||
|  |             table.concat(bibfiles, ' ')) | ||
|  | end | ||
|  | @ Because the preamble is accumulated as the \texttt{.bib} file is | ||
|  | read, it must be copied at the end. | ||
|  | @ | ||
|  | Here we open files. | ||
|  | If we're not being permissive, we must open each file successfully. | ||
|  | If we're permissive, it's enough to get at least one. | ||
|  | 
 | ||
|  | To find the pathname for a bib file, we use [[bibtex.bibpath]]. | ||
|  | <<definition of function [[openbib]], which sets [[get_one_bib]] if successful>>= | ||
|  | local function openbib(bib, macros) | ||
|  |   macros = macros or bibstyle.macros() | ||
|  |   local filename, msg = bibtex.bibpath(bib) | ||
|  |   if not filename then | ||
|  |     if not permissive then biberrorf("Cannot find file %s.bib", bib) end | ||
|  |     return | ||
|  |   end | ||
|  |   local rdr = bibtex.open(filename, macros, strict and emit_warning or hold_warning) | ||
|  |   if not rdr and not permissive then | ||
|  |     biberrorf("Cannot open file %s.bib", bib) | ||
|  |     return | ||
|  |   end | ||
|  |   got_one_bib = true | ||
|  |   return filename, rdr | ||
|  | end | ||
|  | @  | ||
|  | \subsubsection{Duplication checks} | ||
|  | 
 | ||
|  | There's a great deal of nuisance to checking the integrity of a | ||
|  | \texttt{.bib} file.  | ||
|  | <<definition of local function [[not_dup]]>>= | ||
|  | <<abstraction exporting [[savecomplaint]] and [[issuecomplaints]]>> | ||
|  | 
 | ||
|  | local k = string.lower(key) | ||
|  | local function not_dup(dup) | ||
|  |   local e1, e2 = dup[k], e | ||
|  |   if e1 then | ||
|  |     -- do return false end --- avoid extra msgs for now | ||
|  |     local diff = entries_differ(e1, e2) | ||
|  |     if diff then | ||
|  |       local verybad = not permissive or e1.file == e2.file | ||
|  |       local complain = verybad and biberrorf or bibwarnf | ||
|  |       if e1.key == e2.key then | ||
|  |         if verybad then | ||
|  |           savecomplaint(e1, e2, complain, | ||
|  |                         "Ignoring second entry with key '%s' on file %s, line %d\n" .. | ||
|  |                         "  (first entry occurred on file %s, line %d;\n".. | ||
|  |                         "   entries differ in %s)\n", | ||
|  |                         e2.key, e2.file, e2.line, e1.file, e1.line, diff) | ||
|  |         end | ||
|  |       else | ||
|  |         savecomplaint(e1, e2, complain, | ||
|  |            "Entries '%s' on file %s, line %d and\n  '%s' on file %s, line %d" .. | ||
|  |            " have keys that differ only in case\n", | ||
|  |            e1.key, e1.file, e1.line, e2.key, e2.file, e2.line) | ||
|  |       end | ||
|  |     elseif e1.file == e2.file then | ||
|  |       savecomplaint(e1, e2, bibwarnf, | ||
|  |         "Entry '%s' is duplicated in file '%s' at both line %d and line %d\n", | ||
|  |         e1.key, e1.file, e1.line, e2.line) | ||
|  |     elseif not permissive then | ||
|  |       savecomplaint(e1, e2, bibwarnf, | ||
|  |         "Entry '%s' appears both on file '%s', line %d and file '%s', line %d".. | ||
|  |         "\n  (entries are exact duplicates)\n", | ||
|  |         e1.key, e1.file, e1.line, e2.file, e2.line) | ||
|  |     end | ||
|  |     return false | ||
|  |   else | ||
|  |     dup[k] = e | ||
|  |     return true | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | Calling [[savecomplaint(e1, e2, complain, ...)]] takes the complaint | ||
|  | [[complain(...)]] and associates it with entries [[e1]] and [[e2]]. | ||
|  | If we are operating in ``strict'' mode, the complaint is issued right | ||
|  | away; otherwise | ||
|  | calling [[issuecomplaints(e)]] issues the complaint lazily. | ||
|  | In non-strict, lazy mode, the outside world arranges to issue only | ||
|  | complaints with entries that are actually used. | ||
|  | <<abstraction exporting [[savecomplaint]] and [[issuecomplaints]]>>= | ||
|  | local savecomplained, issuecomplaints | ||
|  | if strict then | ||
|  |   function savecomplaint(e1, e2, complain, ...) | ||
|  |     return complain(...) | ||
|  |   end | ||
|  |   function issuecomplaints(e) end | ||
|  | else | ||
|  |   local complaints = { } | ||
|  |   local function save(e, t) | ||
|  |     complaints[e] = complaints[e] or { } | ||
|  |     table.insert(complaints[e], t) | ||
|  |   end | ||
|  |   function savecomplaint(e1, e2, ...) | ||
|  |     save(e1, { ... }) | ||
|  |     save(e2, { ... }) | ||
|  |   end | ||
|  |   local function call(c, ...) | ||
|  |     return c(...) | ||
|  |   end | ||
|  |   function issuecomplaints(e) | ||
|  |     for _, c in ipairs(complaints[e] or { }) do | ||
|  |       call(unpack(c)) | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<Lua utility functions>>= | ||
|  | -- return 'key' or 'type' or 'field <name>' at which entries differ, | ||
|  | -- or nil if entries are the same | ||
|  | local function entries_differ(e1, e2, notkey) | ||
|  |   if e1.key  ~= e2.key  and not notkey then return 'key'  end | ||
|  |   if e1.type ~= e2.type                then return 'type' end | ||
|  |   for k, v in pairs(e1.fields) do | ||
|  |     if e2.fields[k] ~= v then return 'field ' .. k end | ||
|  |   end | ||
|  |   for k, v in pairs(e2.fields) do | ||
|  |     if e1.fields[k] ~= v then return 'field ' .. k end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | I've seen at least one bibliography with identical entries listed | ||
|  | under multiple keys.  (Thanks, Andrew.) | ||
|  | <<Lua utility functions>>= | ||
|  | -- every entry is identical to every other | ||
|  | local function all_entries_identical(es, notkey) | ||
|  |   if table.getn(es) == 0 then return true end | ||
|  |   for i = 2, table.getn(es) do | ||
|  |     if entries_differ(es[1], es[i], notkey) then | ||
|  |       return false | ||
|  |     end | ||
|  |   end | ||
|  |   return true | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Computing and emitting the list of citations} | ||
|  | 
 | ||
|  | A significant complexity added in \nbibtex\ is that a single entry may | ||
|  | be cited using more than one citation key. | ||
|  | For example, [[\cite{milner:type-polymorphism}]] and  | ||
|  | [[\cite{milner:theory-polymorphism}]] may well specify the same paper. | ||
|  | Thus, in addition to a list of citations, I~also keep track of the set | ||
|  | of keys with which each entry is cited, as well as the first such key. | ||
|  | The function [[cite]] manages all these data structures. | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | local citations = { } -- list of citations | ||
|  | local cited = { } -- (entry -> key set) table | ||
|  | local first_cited = { } -- (entry -> key) table | ||
|  | local function cite(c, e) -- cite entry e with key c | ||
|  |   local seen = cited[e] | ||
|  |   cited[e] = seen or { } | ||
|  |   cited[e][c] = true | ||
|  |   if not seen then | ||
|  |     first_cited[e] = c | ||
|  |     table.insert(citations, e) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | When the dust settles, we adjust members of each citation record: | ||
|  | the first key actually used becomes [[key]], | ||
|  | the original key becomes [[orig_key]], and other keys go into [[also_cited_as]]. | ||
|  | <<using [[cited]] and [[first_cited]], adjust fields [[key]] and [[also_cited_as]]>>= | ||
|  | for i = 1, table.getn(citations) do | ||
|  |   local c = citations[i] | ||
|  |   local key = assert(first_cited[c], "citation is not cited?!") | ||
|  |   c.orig_key, c.key = c.key, key | ||
|  |   local also = { } | ||
|  |   for k in pairs(cited[c]) do | ||
|  |     if k ~= key then table.insert(also, k) end | ||
|  |   end | ||
|  |   c.also_cited_as = also | ||
|  | end | ||
|  | @  | ||
|  | For each actual [[\cite]] command in the original {\LaTeX} file, we | ||
|  | call [[find_entry]] to find an appropriate \bibtex\ entry. | ||
|  | Because a [[\cite]] command might match more than one paper, the | ||
|  | results may be ambiguous.   | ||
|  | We therefore produce a list of all \emph{candidates} matching the | ||
|  | [[\cite]] command. | ||
|  | If we're permissive, we search one list of entries after another, | ||
|  | stopping as soon as we get some candidates. | ||
|  | If we're not permissive, we have just one list of entries overall, so | ||
|  | we search it and we're done. | ||
|  | If permissive, we search entry lists in turn until we  | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | local find_entry -- function from key to citation | ||
|  | do | ||
|  |   local cache = { } -- (citation-key -> entry) table | ||
|  | 
 | ||
|  |   function find_entry(c) | ||
|  |     local function remember(e) cache[c] = e; return e end -- cache e and return it | ||
|  | 
 | ||
|  |     if cache[c] or dupcheck[c] then | ||
|  |       return cache[c] or dupcheck[c] | ||
|  |     else | ||
|  |       local candidates | ||
|  |       if permissive then | ||
|  |         for _, entries in ipairs(bibentries) do | ||
|  |           candidates = query(c, entries) | ||
|  |           if table.getn(candidates) > 0 then break end | ||
|  |         end | ||
|  |       else | ||
|  |         candidates = query(c, bibentries) | ||
|  |       end | ||
|  |       assert(candidates)     | ||
|  |       <<from the available [[candidates]], choose one and [[remember]] it>> | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | If we have no candidates, we're hosed. | ||
|  | Otherwise, if all the candidates are identical (most likely when there | ||
|  | is a unique candidate, but still possible otherwise),\footnote | ||
|  | {Andrew Appel has a bibliography in which the \emph{Definition of | ||
|  |     Standard~ML} appears as two different entries that are identical | ||
|  |   except for keys.} | ||
|  | we take the first. | ||
|  | Finally,  if there are multiple, distinct candidates to choose from, | ||
|  | we take the first and issue a warning message. | ||
|  | To avoid surprising the unwary coauthor, we put a warning message into | ||
|  | the entry as well, from which it will go into the printed bibliography. | ||
|  | <<from the available [[candidates]], choose one and [[remember]] it>>= | ||
|  | if table.getn(candidates) == 0 then | ||
|  |   biberrorf('No .bib entry matches \\cite{%s}', c) | ||
|  | elseif all_entries_identical(candidates, 'notkey') then | ||
|  |   logf("Query '%s' produced unique candidate %s from %s\n", | ||
|  |           c, candidates[1].key, candidates[1].file) | ||
|  |   return remember(candidates[1]) | ||
|  | else | ||
|  |   local e = table.copy(candidates[1]) | ||
|  |   <<warn of multiple candidates for query [[c]]>> | ||
|  |   e.warningmsg = string.format('[This entry is the first match for query ' .. | ||
|  |                                '\\texttt{%s}, which produced %d matches.]', | ||
|  |                                c, table.getn(candidates)) | ||
|  |   return remember(e) | ||
|  | end | ||
|  | @  | ||
|  | I can do better later\ldots | ||
|  | <<warn of multiple candidates for query [[c]]>>= | ||
|  | bibwarnf("Query '%s' produced %d candidates\n    (using %s from %s)\n", | ||
|  |          c, table.getn(candidates), e.key, e.file) | ||
|  | bibwarnf("First two differ in %s\n", entries_differ(candidates[1], candidates[2], true)) | ||
|  | @  | ||
|  | The [[query]] function uses the engine described in Section~\ref{sec:query}. | ||
|  | <<definition of [[query]], used to search a list of entries>>= | ||
|  | function query(c, entries) | ||
|  |   local p = matchq(c) | ||
|  |   local t = { } | ||
|  |   for _, e in ipairs(entries) do | ||
|  |     if p(e.type, e.fields) then | ||
|  |       table.insert(t, e) | ||
|  |     end | ||
|  |   end | ||
|  |   return t | ||
|  | end | ||
|  | bibtex.query = query | ||
|  | bibtex.doc.query = 'query: string -> entry list -> entry list' | ||
|  | <<declarations of internal functions>>= | ||
|  | local query | ||
|  | local matchq | ||
|  | bibtex.doc.matchq = 'matchq: string -> predicate --- compile query string' | ||
|  | bibtex.matchq = matchq | ||
|  | @  | ||
|  | Finally we can compute the list of entries: | ||
|  | search on each citation key, and if we had [[\cite{*}]] or | ||
|  | [[\nocite{*}]], add all the other entries as well. | ||
|  | The [[cite]] command takes care of avoiding duplicates. | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | for _, c in ipairs(citekeys) do | ||
|  |   local e = find_entry(c) | ||
|  |   if e then cite(c, e) end | ||
|  | end | ||
|  | if cited_star then | ||
|  |   for _, es in ipairs(permissive and bibentries or {bibentries}) do | ||
|  |     logf('Adding all entries in list of %d\n', table.getn(es)) | ||
|  |     for _, e in ipairs(es) do | ||
|  |       cite(e.key, e) | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | <<using [[cited]] and [[first_cited]], adjust fields [[key]] and [[also_cited_as]]>> | ||
|  | @  | ||
|  | I've always hated \bibtex's cross-reference feature, but I believe | ||
|  | I've implemented it faithfully. | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | bibtex.do_crossrefs(citations, find_entry) | ||
|  | @  | ||
|  | With the entries computed, there are two ways to emit: | ||
|  | as another \bibtex\ file or as required by the style file. | ||
|  | So that we can read from [[bblname]] before writing to it, | ||
|  | the opening of [[bbl]] is carefully delayed to this point. | ||
|  | <<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>= | ||
|  | <<emit warnings for entries in [[citations]]>> | ||
|  | local bbl = bblname == '-' and io.stdout or open(bblname, 'w') | ||
|  | if bib_out then | ||
|  |   bibtex.emit(bbl, preamble, citations) | ||
|  | else | ||
|  |   bibstyle.emit(bbl, preamble, citations) | ||
|  | end | ||
|  | if bblname ~= '-' then bbl:close() end | ||
|  | @  | ||
|  | Here's a function to emit a list of citations as \bibtex\ source. | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.emit =  | ||
|  |   'outfile * string list * entry list -> unit -- write citations in .bib format' | ||
|  | function bibtex.emit(bbl, preamble, citations) | ||
|  |   local warned = false | ||
|  |   if preamble[1] then | ||
|  |     bbl:write('@preamble{\n') | ||
|  |     for i = 1, table.getn(preamble) do | ||
|  |       bbl:write(string.format('  %s "%s"\n', i > 1 and '#' or ' ', preamble[i])) | ||
|  |     end | ||
|  |     bbl:write('}\n\n') | ||
|  |   end | ||
|  |   for _, e in ipairs(citations) do | ||
|  |     local also = e.also_cited_as | ||
|  |     if also and table.getn(also) > 0 then | ||
|  |       for _, k in ipairs(e.also_cited_as or { }) do | ||
|  |         bbl:write(string.format('@%s{%s, crossref={%s}}\n', e.type, k, e.key)) | ||
|  |       end | ||
|  |       if not warned then | ||
|  |         warned = true | ||
|  |         bibwarnf("Warning: some entries (such as %s) are cited with multiple keys;\n".. | ||
|  |           "  in the emitted .bib file, these entries are duplicated (using crossref)\n", | ||
|  |                  e.key) | ||
|  |       end | ||
|  |     end | ||
|  |     emit_tkf.bib(bbl, e.type, e.key, e.fields) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<emit warnings for entries in [[citations]]>>= | ||
|  | for _, e in ipairs(citations) do | ||
|  |   if warnings[e] then | ||
|  |     for _, w in ipairs(warnings[e]) do emit_warning(unpack(w)) end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Cross-reference} | ||
|  | 
 | ||
|  | If an entry contains a [[crossref]] field,  | ||
|  | that field is used as a key to find the parent, and the entry inherits | ||
|  | missing fields from the parent. | ||
|  | 
 | ||
|  | If the parent is cross-referenced sufficiently often (i.e., more than | ||
|  | [[min_crossref]] times), it may be added | ||
|  | to the citation list, in which case the style file knows what to do | ||
|  | with the [[crossref]] field. | ||
|  | But if the parent is not cited sufficiently often,  | ||
|  | it disappears, and do does the [[crossref]] field. | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.do_crossrefs = "citation list -> unit # add crossref'ed fields in place" | ||
|  | function bibtex.do_crossrefs(citations, find_entry) | ||
|  |   local map  = { } --- key to entry (on citation list) | ||
|  |   local xmap = { } --- key to entry (xref'd only) | ||
|  |   local xref_count = { } -- entry -> number of times xref'd | ||
|  |   <<make [[map]] map lower-case keys in [[citations]] to their entries>> | ||
|  |   for i = 1, table.getn(citations) do | ||
|  |     local c = citations[i] | ||
|  |     if c.fields.crossref then | ||
|  |       local lowref = string.lower(c.fields.crossref) | ||
|  |       local parent = map[lowref] or xmap[lowref] | ||
|  |       if not parent and find_entry then | ||
|  |         parent = find_entry(lowref) | ||
|  |         xmap[lowref] = parent | ||
|  |       end | ||
|  |       if not parent then | ||
|  |         biberrorf("Entry %s cross-references to %s, but I can't find %s", | ||
|  |                   c.key, c.fields.crossref, c.fields.crossref) | ||
|  |         c.fields.crossref = nil | ||
|  |       else | ||
|  |         xref_count[parent] = (xref_count[parent] or 0) + 1  | ||
|  |         local fields = c.fields | ||
|  |         fields.crossref = parent.key -- force a case match! | ||
|  |         for k, v in pairs(parent.fields) do -- inherit field if missing | ||
|  |           fields[k] = fields[k] or v | ||
|  |         end | ||
|  |       end | ||
|  |     end | ||
|  |   end | ||
|  |   <<add oft-crossref'd entries from [[xmap]] to the list in [[citations]]>> | ||
|  |   <<remove [[crossref]] fields for entries with seldom-crossref'd parents>> | ||
|  | end | ||
|  | <<make [[map]] map lower-case keys in [[citations]] to their entries>>= | ||
|  | for i = 1, table.getn(citations) do | ||
|  |   local c = citations[i] | ||
|  |   local key = string.lower(c.key) | ||
|  |   map[key] = map[key] or c | ||
|  | end | ||
|  | <<add oft-crossref'd entries from [[xmap]] to the list in [[citations]]>>= | ||
|  | for _, e in pairs(xmap) do -- includes only missing entries | ||
|  |   if xref_count[e] >= min_crossrefs then | ||
|  |     table.insert(citations, e) | ||
|  |   end | ||
|  | end | ||
|  | <<remove [[crossref]] fields for entries with seldom-crossref'd parents>>= | ||
|  | for i = 1, table.getn(citations) do | ||
|  |   local c = citations[i] | ||
|  |   if c.fields.crossref then | ||
|  |     local parent = xmap[string.lower(c.fields.crossref)] | ||
|  |     if parent and xref_count[parent] < min_crossrefs then | ||
|  |       c.fields.crossref = nil | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsection{The query engine (i.e., the point of it all)} | ||
|  | 
 | ||
|  | \label{sec:query} | ||
|  | 
 | ||
|  | The query language is described in the man page for [[nbibtex]]. | ||
|  | Its implementation is divided into two parts: | ||
|  | the internal predicates which are composed to form a query predicate,  | ||
|  | and the parser that takes a string and produces a query predicate. | ||
|  | Function [[matchq]] is declared [[local]] above and is the only | ||
|  | function visible outside this block. | ||
|  | <<exported Lua functions>>= | ||
|  | do | ||
|  |   if not boyer_moore then | ||
|  |     require 'boyer-moore' | ||
|  |   end | ||
|  |   local bm = boyer_moore | ||
|  |   local compile = bm.compilenc | ||
|  |   local search  = bm.matchnc | ||
|  | 
 | ||
|  |   -- type predicate = type * field table -> bool | ||
|  |   -- val match   : field * string -> predicate | ||
|  |   -- val author  : string -> predicate | ||
|  |   -- val matchty : string -> predicate | ||
|  |   -- val andp    : predicate option * predicate option -> predicate option | ||
|  |   -- val orp     : predicate option * predicate option -> predicate option | ||
|  |   -- val matchq  : string -> predicate --- compile query string | ||
|  | 
 | ||
|  |   <<definitions of query-predicate functions>> | ||
|  | 
 | ||
|  |   <<definition of [[matchq]], the query compiler>> | ||
|  |   <<definition of [[query]], used to search a list of entries>> | ||
|  | end | ||
|  | @  | ||
|  | \subsubsection{Query predicates} | ||
|  | 
 | ||
|  | The common case is a predicate for a named field. | ||
|  | We also have some special syntax for ``all fields'' and the \bibtex\ | ||
|  | ``type,'' which is not a field. | ||
|  | <<definitions of query-predicate functions>>= | ||
|  | local matchty | ||
|  | local function match(field, string) | ||
|  |   if string == '' then return nil end | ||
|  |   local pat = compile(string) | ||
|  |   if field == '*' then | ||
|  |     return function (t, fields) | ||
|  |              for _, v in pairs(fields) do if search(pat, v) then return true end end | ||
|  |            end | ||
|  |   elseif field == '[type]' then | ||
|  |     return matchty(string) | ||
|  |   else | ||
|  |     return function (t, fields) return search(pat, fields[field] or '') end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | Here's a type matcher. | ||
|  | <<definitions of query-predicate functions>>= | ||
|  | function matchty(string) | ||
|  |   if string == '' then return nil end | ||
|  |   local pat = compile(string) | ||
|  |   return function (t, fields) return search(pat, t) end | ||
|  | end | ||
|  | @  | ||
|  | We make a special case of [[author]] because it really means ``author | ||
|  | or editor.'' | ||
|  | <<definitions of query-predicate functions>>= | ||
|  | local function author(string) | ||
|  |   if string == '' then return nil end | ||
|  |   local pat = compile(string) | ||
|  |   return function (t, fields) | ||
|  |            return search(pat, fields.author or fields.editor or '') | ||
|  |          end | ||
|  | end | ||
|  | @  | ||
|  | We conjoin and disjoin predicates, being careful to use tail calls | ||
|  | (not [[and]] and [[or]]) in order to save stack space. | ||
|  | <<definitions of query-predicate functions>>= | ||
|  | local function andp(p, q) | ||
|  |   -- associate to right for constant stack space | ||
|  |   if not p then | ||
|  |     return q | ||
|  |   elseif not q then  | ||
|  |     return p | ||
|  |   else  | ||
|  |     return function (t,f) if p(t,f) then return q(t,f) end end | ||
|  |   end | ||
|  | end | ||
|  | <<definitions of query-predicate functions>>= | ||
|  | local function orp(p, q) | ||
|  |   -- associate to right for constant stack space | ||
|  |   if not p then | ||
|  |     return q | ||
|  |   elseif not q then  | ||
|  |     return p | ||
|  |   else  | ||
|  |     return function (t,f) if p(t,f) then return true else return q(t,f) end end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsubsection{The query compiler} | ||
|  | 
 | ||
|  | The function [[matchq]] takes the syntax explained in the man page and | ||
|  | produces a predicate. | ||
|  | <<definition of [[matchq]], the query compiler>>= | ||
|  | function matchq(query) | ||
|  |   local find = string.find | ||
|  |   local parts = split(query, '%:') | ||
|  |   local p = nil | ||
|  |   if parts[1] and not find(parts[1], '=') then | ||
|  |     <<add to [[p]] a match for [[parts[1]]] as author>> | ||
|  |     table.remove(parts, 1) | ||
|  |     if parts[1] and not find(parts[1], '=') then | ||
|  |       <<add to [[p]] a match for [[parts[1]]] as title or year>> | ||
|  |       table.remove(parts, 1) | ||
|  |       if parts[1] and not find(parts[1], '=') then | ||
|  |         <<add to [[p]] a match for [[parts[1]]] as type or year>> | ||
|  |         table.remove(parts, 1) | ||
|  |       end | ||
|  |     end | ||
|  |   end | ||
|  |   for _, part in ipairs(parts) do | ||
|  |     if not find(part, '=') then | ||
|  |       biberrorf('bad query %q --- late specs need = sign', query) | ||
|  |     else | ||
|  |       local _, _, field, words = find(part, '^(.*)=(.*)$') | ||
|  |       assert(field and words, 'bug in query parsing') | ||
|  |       <<add to [[p]] a match for [[words]] as [[field]]>> | ||
|  |     end | ||
|  |   end | ||
|  |   if not p then | ||
|  |     bibwarnf('empty query---matches everything\n') | ||
|  |     return function() return true end | ||
|  |   else | ||
|  |     return p | ||
|  |   end | ||
|  | end | ||
|  | @        | ||
|  | Here's where an unnamed key defaults to author or editor. | ||
|  | <<add to [[p]] a match for [[parts[1]]] as author>>= | ||
|  | for _, word in ipairs(split(parts[1], '-')) do | ||
|  |   p = andp(author(word), p) | ||
|  | end | ||
|  | <<add to [[p]] a match for [[parts[1]]] as title or year>>= | ||
|  | local field, words = find(parts[1], '%D') and 'title' or 'year', parts[1] | ||
|  | <<add to [[p]] a match for [[words]] as [[field]]>> | ||
|  | <<add to [[p]] a match for [[parts[1]]] as type or year>>= | ||
|  | if find(parts[1], '%D') then | ||
|  |   local ty = nil | ||
|  |   for _, word in ipairs(split(parts[1], '-')) do | ||
|  |     ty = orp(matchty(word), ty) | ||
|  |   end | ||
|  |   p = andp(p, ty) --- check type last for efficiency | ||
|  | else | ||
|  |   for _, word in ipairs(split(parts[1], '-')) do | ||
|  |     p = andp(p, match('year', word)) -- check year last for efficiency | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | There could be lots of matches on a year, so we check years last. | ||
|  | <<add to [[p]] a match for [[words]] as [[field]]>>= | ||
|  | for _, word in ipairs(split(words, '-')) do | ||
|  |   if field == 'year' then | ||
|  |     p = andp(p, match(field, word)) | ||
|  |   else | ||
|  |     p = andp(match(field, word), p) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Path search and other system-dependent stuff} | ||
|  | 
 | ||
|  | To find a bib file, I rely on the \texttt{kpsewhich} program, | ||
|  | which is typically found on Unix {\TeX} installations, and which | ||
|  | should guarantee to find the same bib files as normal bibtex. | ||
|  | <<Lua utility functions>>= | ||
|  | assert(io.popen) | ||
|  | local function capture(cmd, raw) | ||
|  |   local f = assert(io.popen(cmd, 'r')) | ||
|  |   local s = assert(f:read('*a')) | ||
|  |   assert(f:close())  --- can't get an exit code | ||
|  |   if raw then return s end | ||
|  |   s = string.gsub(s, '^%s+', '') | ||
|  |   s = string.gsub(s, '%s+$', '') | ||
|  |   s = string.gsub(s, '[\n\r]+', ' ') | ||
|  |   return s | ||
|  | end | ||
|  | @  | ||
|  | Function [[bibpath]] is normally called on a bibname in a {\LaTeX} | ||
|  | file, but because a bibname may also be given on the command line,  | ||
|  | we add \texttt{.bib} only if not already present. | ||
|  | Also, because we can | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.bibpath = 'string -> string # from \\bibliography name, find pathname of file' | ||
|  | function bibtex.bibpath(bib) | ||
|  |   if find(bib, '/') then | ||
|  |     local f, msg = io.open(bib) | ||
|  |     if not f then | ||
|  |       return nil, msg | ||
|  |     else | ||
|  |       f:close() | ||
|  |       return bib | ||
|  |     end | ||
|  |   else | ||
|  |     if not find(bib, '%.bib$') then | ||
|  |       bib = bib .. '.bib' | ||
|  |     end | ||
|  |     local pathname = capture('kpsewhich ' .. bib) | ||
|  |     if string.len(pathname) > 1 then | ||
|  |       return pathname | ||
|  |     else | ||
|  |       return nil, 'kpsewhich cannot find ' .. bib | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \section{Implementation of \texttt{nbibfind}} | ||
|  | 
 | ||
|  | 
 | ||
|  | \subsection{Output formats for \bibtex\ entries} | ||
|  | 
 | ||
|  | We can emit a \bibtex\ entry in any of three formats: | ||
|  | [[bib]], [[terse]], and [[full]]. | ||
|  | An emitter takes as arguments the type, key, and fields of the entry, | ||
|  | and optionally the name of the file the entry came from. | ||
|  | <<Lua utility functions>>= | ||
|  | local emit_tkf = { } | ||
|  | @  | ||
|  | The simplest entry is legitimate \bibtex\ source: | ||
|  | <<exported Lua functions>>= | ||
|  | function emit_tkf.bib(outfile, type, key, fields) | ||
|  |   outfile:write('@', type, '{', key, ',\n') | ||
|  |   for k, v in pairs(fields) do | ||
|  |     outfile:write('  ', k, ' = {', v, '},\n') | ||
|  |   end | ||
|  |   outfile:write('}\n\n') | ||
|  | end | ||
|  | @  | ||
|  | For the other two entries, we devise a string format. | ||
|  | In principle, we could go with an ASCII form of a full-blown style, | ||
|  | but since the purpose is to identify the entry in relatively few | ||
|  | characters, it seems sufficient to spit out the author, year, title, | ||
|  | and possibly the source. | ||
|  | ``Full'' output shows the whole string; ``terse'' is just the first line. | ||
|  | <<exported Lua functions>>= | ||
|  | do | ||
|  |   local function bibstring(type, key, fields, bib) | ||
|  |     <<define local [[format_lab_names]] as for a bibliography label>> | ||
|  |     local names = format_lab_names(fields.author) or | ||
|  |                   format_lab_names(fields.editor) or | ||
|  |                   fields.key or fields.organization or '????' | ||
|  |     local year = fields.year | ||
|  |     local lbl = names .. (year and ' ' .. year or '') | ||
|  |     local title = fields.title or '????' | ||
|  |     if bib then | ||
|  |       key = string.gsub(bib, '.*/', '') .. ': ' .. key | ||
|  |     end | ||
|  |     local answer = | ||
|  |       bib and | ||
|  |       string.format('%-25s = %s: %s', key, lbl, title) or | ||
|  |       string.format('%-21s = %s: %s', key, lbl, title) | ||
|  |     local where = fields.booktitle or fields.journal | ||
|  |     if where then answer = answer .. ', in ' .. where end | ||
|  |     answer = string.gsub(answer, '%~', ' ') | ||
|  |     for _, cs in ipairs { 'texttt', 'emph', 'textrm', 'textup' } do | ||
|  |       answer = string.gsub(answer, '\\' .. cs .. '%A', '') | ||
|  |     end | ||
|  |     answer = string.gsub(answer, '[%{%}]', '') | ||
|  |     return answer | ||
|  |   end | ||
|  | 
 | ||
|  |   function emit_tkf.terse(outfile, type, key, fields, bib) | ||
|  |     outfile:write(truncate(bibstring(type, key, fields, bib), 80), '\n') | ||
|  |   end | ||
|  | 
 | ||
|  |   function emit_tkf.full(outfile, type, key, fields, bib) | ||
|  |     local w = bst.writer(outfile) | ||
|  |     w:write(bibstring(type, key, fields, bib), '\n') | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<define local [[format_lab_names]] as for a bibliography label>>= | ||
|  | local format_lab_names | ||
|  | do  | ||
|  |   local fmt = '{vv }{ll}' | ||
|  |   local function format_names(s) | ||
|  |     local s = bst.commafy(bst.format_names(fmt, bst.namesplit(s))) | ||
|  |     return (string.gsub(s, ' and others$', ' et al.')) | ||
|  |   end | ||
|  |   function format_lab_names(s) | ||
|  |     if not s then return s end | ||
|  |     local t = bst.namesplit(s) | ||
|  |     if table.getn(t) > 3 then | ||
|  |       return bst.format_name(fmt, t[1]) .. ' et al.' | ||
|  |     else | ||
|  |       return format_names(s) | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | Function [[truncate]] | ||
|  | returns enough of a string to fit in [[n]] columns, with ellipses as | ||
|  | needed.  | ||
|  | <<Lua utility functions>>= | ||
|  | local function truncate(s, n) | ||
|  |   local l = string.len(s) | ||
|  |   if l <= n then | ||
|  |     return s | ||
|  |   else | ||
|  |     return string.sub(s, 1, n-3) .. '...' | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | @ | ||
|  | \subsection{Main functions for \texttt{nbibfind}} | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.run_find = 'string list -> unit # main program for nbibfind' | ||
|  | bibtex.doc.find = 'string * string list -> entry list' | ||
|  | 
 | ||
|  | function bibtex.find(pattern, bibs) | ||
|  |   local es = { } | ||
|  |   local p = matchq(pattern) | ||
|  |   for _, bib in ipairs(bibs) do | ||
|  |     local rdr = bibtex.open(bib, bst.months(), hold_warning) | ||
|  |     for type, key, fields in entries(rdr) do | ||
|  |       if type == nil then | ||
|  |         break | ||
|  |       elseif not type then | ||
|  |         io.stderr:write('Something disastrous happened with entry ', key, '\n') | ||
|  |       elseif key == pattern or p(type, fields) then | ||
|  |         <<emit held warnings, if any>> | ||
|  |         table.insert(es, { type = type, key = key, fields = fields, | ||
|  |                            bib = table.getn(bibs) > 1 and bib }) | ||
|  |       else | ||
|  |         drop_warnings() | ||
|  |       end | ||
|  |     end | ||
|  |     rdr:close() | ||
|  |   end | ||
|  |   return es | ||
|  | end | ||
|  | 
 | ||
|  | function bibtex.run_find(argv) | ||
|  |   local emit = emit_tkf.terse | ||
|  |   while argv[1] and find(argv[1], '^-') do | ||
|  |     if emit_tkf[string.sub(argv[1], 2)] then | ||
|  |       emit = emit_tkf[string.sub(argv[1], 2)] | ||
|  |     else | ||
|  |       biberrorf('Unrecognized option %s', argv[1]) | ||
|  |     end | ||
|  |     table.remove(argv, 1) | ||
|  |   end | ||
|  |   if table.getn(argv) == 0 then | ||
|  |     io.stderr:write(string.format('Usage: %s [-bib|-terse|-full] pattern [bibs]\n', | ||
|  |                                   string.gsub(argv[0], '.*/', ''))) | ||
|  |     os.exit(1) | ||
|  |   end | ||
|  |   local pattern = table.remove(argv, 1) | ||
|  |   local bibs = { } | ||
|  |   <<make [[bibs]] the list of pathnames implied by [[argv]]>> | ||
|  | 
 | ||
|  |   local entries = bibtex.find(pattern, bibs) | ||
|  |   for _, e in ipairs(entries) do | ||
|  |     emit(io.stdout, e.type, e.key, e.fields, e.bib) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | If we have no arguments, search all available bibfiles. | ||
|  | Otherwise, an argument with a~[[/]] is a pathname, and  | ||
|  | an argument without~[[/]] is a name as it would appear in  | ||
|  | [[\bibliography]]. | ||
|  | <<make [[bibs]] the list of pathnames implied by [[argv]]>>= | ||
|  | if table.getn(argv) == 0 then | ||
|  |   bibs = all_bibs() | ||
|  | else | ||
|  |   for _, a in ipairs(argv) do | ||
|  |     if find(a, '/') then | ||
|  |       table.insert(bibs, a) | ||
|  |     else | ||
|  |       table.insert(bibs, assert(bibtex.bibpath(a))) | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<emit held warnings, if any>>= | ||
|  | local ws = held_warnings() | ||
|  | if ws then | ||
|  |   for _, w in ipairs(ws) do | ||
|  |     emit_warning(unpack(w)) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | To search all bib files, we lean heavily on \texttt{kpsewhich}, which is | ||
|  | distributed with the Web2C version of {\TeX}, and which knows exactly | ||
|  | which directories to search. | ||
|  | <<post-split Lua utility functions>>= | ||
|  | local function all_bibs() | ||
|  |   local pre_path = assert(capture('kpsewhich -show-path bib')) | ||
|  |   local path = assert(capture('kpsewhich -expand-path ' .. pre_path)) | ||
|  |   local bibs = { } -- list of results | ||
|  |   local inserted = { } -- set of inserted bibs, to avoid duplicates | ||
|  |   for _, dir in ipairs(split(path, ':')) do | ||
|  |     local files = assert(capture('echo ' .. dir .. '/*.bib')) | ||
|  |     for _, file in ipairs(split(files, '%s')) do | ||
|  |       if readable(file) then | ||
|  |         if not (workaround.badbibs and (find(file, 'amsxport%-options') or | ||
|  |                                         find(file, '/plbib%.bib$'))) | ||
|  |         then | ||
|  |           if not inserted[file] then | ||
|  |             table.insert(bibs, file) | ||
|  |             inserted[file] = true | ||
|  |           end | ||
|  |         end | ||
|  |       end | ||
|  |     end | ||
|  |   end | ||
|  |   return bibs | ||
|  | end | ||
|  | bibtex.all_bibs = all_bibs | ||
|  | @ Notice the [[workaround.badbibs]], which prevents us from searching | ||
|  | some bogus bibfiles that come with Thomas Esser's te{\TeX}. | ||
|  | @ | ||
|  | It's a pity there's no more efficient way to see if a file is readable | ||
|  | than to try to read it, but that's portability for you. | ||
|  | <<Lua utility functions>>= | ||
|  | local function readable(file) | ||
|  |   local f, msg = io.open(file, 'r') | ||
|  |   if f then | ||
|  |     f:close() | ||
|  |     return true | ||
|  |   else | ||
|  |     return false, msg | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | 
 | ||
|  | \section{Support for style files} | ||
|  | 
 | ||
|  | A \bibtex\ style file is used to turn a \bibtex\ entry into {\TeX} or | ||
|  | {\LaTeX} code suitable for inclusion in a bibliography. | ||
|  | It can also be used for many other wondrous purposes, such as | ||
|  | generating HTML for web pages. | ||
|  | In classes \bibtex, each style file is written in a rudimentary, | ||
|  | unnamed, stack-based language, | ||
|  | which is described in a document called ``Designing \bibtex\ Styles,'' | ||
|  | which is often called \texttt{btxhak.dvi}. | ||
|  | One of the benefits of \nbibtex\ is that styles can instead be written | ||
|  | in Lua, which is a much more powerful language---and perhaps even | ||
|  | easier to read. | ||
|  | 
 | ||
|  | But while Lua has amply powerful string-processing primitives, it | ||
|  | lacks some of the primitives that are specific to \bibtex. | ||
|  | Most notable among these primitives is the machinery for parsing and | ||
|  | formatting names (of authors, editors and so on). | ||
|  | That machinery is re-implemented here. | ||
|  | If documentation seems scanty, consult the original \texttt{btxhak}. | ||
|  | 
 | ||
|  | @ | ||
|  | In classic \bibtex, each style is its own separate file. | ||
|  | Here, we share code by allowing a single file to register multiple | ||
|  | styles. | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.register_style =  | ||
|  |   [[string * style -> unit # remember style with given name | ||
|  | type style = { emit   : outfile * string list * citation list -> unit | ||
|  |              , style  : table of formatting functions # defined document types | ||
|  |              , macros : unit -> macro table  | ||
|  |              }]] | ||
|  | bibtex.doc.style = 'name -> style # return style with given name, loading on demand' | ||
|  | 
 | ||
|  | do | ||
|  |   local styles = { } | ||
|  | 
 | ||
|  |   function bibtex.register_style(name, s) | ||
|  |     assert(not styles[name], "Duplicate registration of style " .. name) | ||
|  |     styles[name] = s | ||
|  |     s.name = s.name or name | ||
|  |   end | ||
|  | 
 | ||
|  |   function bibtex.style(name) | ||
|  |     if not styles[name] then | ||
|  |       local loaded  | ||
|  |       if config.nbs then | ||
|  |         local loaded = loadfile(config.nbs .. '/' .. name .. '.nbs') | ||
|  |         if loaded then loaded() end | ||
|  |       end | ||
|  |       if not loaded then | ||
|  |         require ('nbib-' .. name) | ||
|  |       end | ||
|  |       if not styles[name] then | ||
|  |         bibfatalf('Tried to load a file, but it did not register style %s\n', name) | ||
|  |       end | ||
|  |     end | ||
|  |     return styles[name]  | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | \subsection{Special string-processing support} | ||
|  | 
 | ||
|  | A great deal of \bibtex's processing depends on giving a special | ||
|  | status to substrings inside braces; | ||
|  | indeed, when such a substring begins with a backslash, it is called a | ||
|  | ``special character.'' | ||
|  | Accordingly, we provide a function to search for a pattern | ||
|  | \emph{outside} balanced braces. | ||
|  | <<Lua utility functions>>= | ||
|  | local function find_outside_braces(s, pat, i) | ||
|  |   local len = string.len(s) | ||
|  |   local j, k = string.find(s, pat, i) | ||
|  |   if not j then return j, k end | ||
|  |   local jb, kb = string.find(s, '%b{}', i) | ||
|  |   while jb and jb < j do --- scan past braces | ||
|  |     --- braces come first, so we search again after close brace | ||
|  |     local i2 = kb + 1 | ||
|  |     j, k = string.find(s, pat, i2) | ||
|  |     if not j then return j, k end | ||
|  |     jb, kb = string.find(s, '%b{}', i2) | ||
|  |   end | ||
|  |   -- either pat precedes braces or there are no braces | ||
|  |   return string.find(s, pat, j) --- 2nd call needed to get captures | ||
|  | end | ||
|  | @  | ||
|  | \subsubsection{String splitting} | ||
|  | 
 | ||
|  | Another common theme in \bibtex\ is the list represented as string. | ||
|  | A~list of names is represented as a string with individual names | ||
|  | separated by ``and.'' | ||
|  | A~name itself is a list of parts separated by whitespace. | ||
|  | So here are some functions to do general splitting. | ||
|  | When we don't care about the separators, we use [[split]]; | ||
|  | when we care only about the separators, we use [[splitters]]; | ||
|  | and | ||
|  | when we care about both, we use [[odd_even_split]]. | ||
|  | <<Lua utility functions>>= | ||
|  | local function split(s, pat, find) --- return list of substrings separated by pat | ||
|  |   find = find or string.find -- could be find_outside_braces | ||
|  |   local len = string.len(s) | ||
|  |   local t = { } | ||
|  |   local insert = table.insert | ||
|  |   local i, j, k = 1, true | ||
|  |   while j and i <= len + 1 do | ||
|  |     j, k = find(s, pat, i) | ||
|  |     if j then | ||
|  |       insert(t, string.sub(s, i, j-1)) | ||
|  |       i = k + 1 | ||
|  |     else | ||
|  |       insert(t, string.sub(s, i)) | ||
|  |     end | ||
|  |   end | ||
|  |   return t | ||
|  | end | ||
|  | @  | ||
|  | Function [[splitters]] returns a table that, when interleaved with the | ||
|  | result of [[split]], reconstructs the original string. | ||
|  | <<Lua utility functions>>= | ||
|  | local function splitters(s, pat, find) --- return list of separators | ||
|  |   find = find or string.find -- could be find_outside_braces | ||
|  |   local t = { } | ||
|  |   local insert = table.insert | ||
|  |   local j, k = find(s, pat, 1) | ||
|  |   while j do | ||
|  |     insert(t, string.sub(s, j, k)) | ||
|  |     j, k = find(s, pat, k+1) | ||
|  |   end | ||
|  |   return t | ||
|  | end | ||
|  | @  | ||
|  | Function [[odd_even_split]] makes odd entries strings between the | ||
|  | sought-for pattern and even entries the strings that match the pattern. | ||
|  | <<Lua utility functions>>= | ||
|  | local function odd_even_split(s, pat) | ||
|  |   local len = string.len(s) | ||
|  |   local t = { } | ||
|  |   local insert = table.insert | ||
|  |   local i, j, k = 1, true | ||
|  |   while j and i <= len + 1 do | ||
|  |     j, k = find(s, pat, i) | ||
|  |     if j then | ||
|  |       insert(t, string.sub(s, i, j-1)) | ||
|  |       insert(t, string.sub(s, j, k)) | ||
|  |       i = k + 1 | ||
|  |     else | ||
|  |       insert(t, string.sub(s, i)) | ||
|  |     end | ||
|  |   end | ||
|  |   return t | ||
|  | end | ||
|  | @  | ||
|  | As a special case, we may want to pull out brace-delimited substrings: | ||
|  | <<Lua utility functions>>= | ||
|  | local function brace_split(s) return odd_even_split(s, '%b{}') end | ||
|  | @  | ||
|  | Some things need splits. | ||
|  | <<Lua utility functions>>= | ||
|  | <<post-split Lua utility functions>> | ||
|  | @ | ||
|  | \subsubsection{String lengths and widths} | ||
|  | 
 | ||
|  | Function [[text_char_count]] counts characters,  | ||
|  | but a special counts as one character. | ||
|  | It is based on \bibtex's [[text.length$]] function. | ||
|  | <<Lua utility functions>>= | ||
|  | local function text_char_count(s) | ||
|  |   local n = 0 | ||
|  |   local i, last = 1, string.len(s) | ||
|  |   while i <= last do | ||
|  |     local special, splast, sp = find(s, '(%b{})', i) | ||
|  |     if not special then | ||
|  |       return n + (last - i + 1) | ||
|  |     elseif find(sp, '^{\\') then | ||
|  |       n = n + (special - i + 1) -- by statute, it's a single character | ||
|  |       i = splast + 1 | ||
|  |     else | ||
|  |       n = n + (splast - i + 1) - 2  -- don't count braces | ||
|  |       i = splast + 1 | ||
|  |     end | ||
|  |   end | ||
|  |   return n | ||
|  | end | ||
|  | bst.text_length = text_char_count | ||
|  | bst.doc.text_length = "string -> int # length (with 'special' char == 1)" | ||
|  | @  | ||
|  | Sometimes we want to know not how many characters are in a string, but | ||
|  | how much space we expect it to take when typeset. | ||
|  | (Or rather, we want to compare such widths to find the widest.) | ||
|  | This is original \bibtex's [[width$]] function. | ||
|  | 
 | ||
|  | The code should use the [[char_width]] array, for which | ||
|  | [[space]] is the only whitespace character given a nonzero printing | ||
|  | width.  The widths here are taken from Stanford's June~'87 | ||
|  | $cmr10$~font and represent hundredths of a point (rounded), but since | ||
|  | they're used only for relative comparisons, the units have no meaning. | ||
|  | <<exported Lua functions>>= | ||
|  | do | ||
|  |   local char_width = { } | ||
|  |   local special_widths = { ss = 500, ae = 722, oe = 778, AE = 903, oe = 1014 } | ||
|  |   for i = 0, 255 do char_width[i] = 0 end | ||
|  |   local char_width_from_32 = { | ||
|  |      278, 278, 500, 833, 500, 833, 778, 278, 389, 389, 500, 778, 278, 333, | ||
|  |      278, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 278, 278, | ||
|  |      278, 778, 472, 472, 778, 750, 708, 722, 764, 681, 653, 785, 750, 361, | ||
|  |      514, 778, 625, 917, 750, 778, 681, 778, 736, 556, 722, 750, 750, | ||
|  |      1028, 750, 750, 611, 278, 500, 278, 500, 278, 278, 500, 556, 444, | ||
|  |      556, 444, 306, 500, 556, 278, 306, 528, 278, 833, 556, 500, 556, 528, | ||
|  |      392, 394, 389, 556, 528, 722, 528, 528, 444, 500, 1000, 500, 500, | ||
|  |   } | ||
|  |   for i = 1, table.getn(char_width_from_32) do | ||
|  |     char_width[32+i-1] = char_width_from_32[i] | ||
|  |   end | ||
|  | 
 | ||
|  |   bst.doc.width = "string -> faux_points # width of string in 1987 cmr10" | ||
|  |   function bst.width(s) | ||
|  |     assert(false, 'have not implemented width yet') | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | \subsection{Parsing names and lists of names} | ||
|  | 
 | ||
|  | Names in a string are separated by \texttt{and} surrounded by nonnull | ||
|  | whitespace.  | ||
|  | Case is not significant. | ||
|  | <<exported Lua functions>>= | ||
|  | local function namesplit(s) | ||
|  |   local t = split(s, '%s+[aA][nN][dD]%s+', find_outside_braces) | ||
|  |   local i = 2 | ||
|  |   while i <= table.getn(t) do | ||
|  |     while find(t[i], '^[aA][nN][dD]%s+') do | ||
|  |       t[i] = string.gsub(t[i], '^[aA][nN][dD]%s+', '') | ||
|  |       table.insert(t, i, '') | ||
|  |       i = i + 1 | ||
|  |     end | ||
|  |     i = i + 1 | ||
|  |   end | ||
|  |   return t | ||
|  | end | ||
|  | bst.namesplit = namesplit | ||
|  | bst.doc.namesplit = 'string -> list of names # split names on "and"' | ||
|  | 
 | ||
|  | @  | ||
|  | <<exported Lua functions>>= | ||
|  | local sep_and_not_tie = '%-' | ||
|  | local sep_chars = sep_and_not_tie .. '%~' | ||
|  | @  | ||
|  | To parse an individual name, we want to count commas. | ||
|  | We first remove leading white space (and [[sep_char]]s), and trailing | ||
|  | white space (and [[sep_char]]s) and commas, complaining for each | ||
|  | trailing comma.   | ||
|  | 
 | ||
|  | We then represent the name as two sequences: [[tokens]] and | ||
|  | [[trailers]]. | ||
|  | The [[tokens]] are the names themselves, and the [[trailers]] are the | ||
|  | separator characters between tokens. | ||
|  | (A~separator is white space, a dash, or a tie, and multiple separators | ||
|  | in sequence are frowned upon.) | ||
|  | The [[commas]] table becomes an array mapping the comma number to the | ||
|  | index of the token that follows it. | ||
|  | <<exported Lua functions>>= | ||
|  | local parse_name | ||
|  | do | ||
|  |   local white_sep         = '[' .. sep_chars .. '%s]+' | ||
|  |   local white_comma_sep   = '[' .. sep_chars .. '%s%,]+' | ||
|  |   local trailing_commas   = '(,[' .. sep_chars .. '%s%,]*)$' | ||
|  |   local sep_char          = '[' .. sep_chars .. ']' | ||
|  |   local leading_white_sep = '^' .. white_sep | ||
|  | 
 | ||
|  |   <<name-parsing utilities>> | ||
|  | 
 | ||
|  |   function parse_name(s, inter_token) | ||
|  |     if string.find(s, trailing_commas) then | ||
|  |       biberrorf("Name '%s' has one or more commas at the end", s) | ||
|  |     end | ||
|  |     s = string.gsub(s, trailing_commas, '') | ||
|  |     s = string.gsub(s, leading_white_sep, '') | ||
|  |     local tokens = split(s, white_comma_sep, find_outside_braces) | ||
|  |     local trailers = splitters(s, white_comma_sep, find_outside_braces) | ||
|  |     <<rewrite [[trailers]] to hold a single separator character each>> | ||
|  |     local commas = { } --- maps each comma to index of token the follows it | ||
|  |     for i, t in ipairs(trailers) do | ||
|  |       string.gsub(t, ',', function() table.insert(commas, i+1) end) | ||
|  |     end | ||
|  |     local name = { } | ||
|  |     <<parse the name tokens and set fields of [[name]]>> | ||
|  |     return name | ||
|  |   end | ||
|  | end | ||
|  | bst.parse_name = parse_name | ||
|  | bst.doc.parse_name = 'string * string option -> name table' | ||
|  | @  | ||
|  | A~name has up to four parts: the most general form is either | ||
|  | ``First von Last, Junior'' or | ||
|  | ``von Last, First, Junior'', but various vons and Juniors can be | ||
|  | omitted. | ||
|  | The name-parsing algorithm is baroque and is transliterated from the | ||
|  | original \bibtex\ source, but the principle is clear: | ||
|  | assign the full version of each part to the four fields  | ||
|  | [[ff]], [[vv]], [[ll]], and [[jj]]; | ||
|  | and  | ||
|  | assign an abbreviated version of each part to the fields  | ||
|  | [[f]], [[v]], [[l]], and [[j]]. | ||
|  | <<parse the name tokens and set fields of [[name]]>>= | ||
|  | local first_start, first_lim, last_lim, von_start, von_lim, jr_lim | ||
|  |   -- variables mark subsequences; if start == lim, sequence is empty | ||
|  | local n = table.getn(tokens) | ||
|  | <<local parsing functions>> | ||
|  | 
 | ||
|  | local commacount = table.getn(commas) | ||
|  | if commacount == 0 then -- first von last jr | ||
|  |   von_start, first_start, last_lim, jr_lim = 1, 1, n+1, n+1 | ||
|  |   <<parse first von last jr>> | ||
|  | elseif commacount == 1 then -- von last jr, first | ||
|  |   von_start, last_lim, jr_lim, first_start, first_lim = | ||
|  |     1, commas[1], commas[1], commas[1], n+1 | ||
|  |   divide_von_from_last() | ||
|  | elseif commacount == 2 then -- von last, jr, first | ||
|  |   von_start, last_lim, jr_lim, first_start, first_lim = | ||
|  |     1, commas[1], commas[2], commas[2], n+1 | ||
|  |   divide_von_from_last() | ||
|  | else | ||
|  |   biberrorf("Too many commas in name '%s'") | ||
|  | end | ||
|  | <<set fields of name based on [[first_start]] and friends>> | ||
|  | @ | ||
|  | The von name, if any, goes from the first von token to the last von | ||
|  | token, except the last name is entitled to at least one token. | ||
|  | So to find the limit of the von name, we start just before the last | ||
|  | token and wind down until we find a von token or we hit the von start | ||
|  | (in which latter case there is no von name). | ||
|  | <<local parsing functions>>= | ||
|  | function divide_von_from_last() | ||
|  |   von_lim = last_lim - 1; | ||
|  |   while von_lim > von_start and not isVon(tokens[von_lim-1]) do | ||
|  |     von_lim = von_lim - 1 | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | OK, here's one form. | ||
|  | <<parse first von last jr>>= | ||
|  | local got_von = false | ||
|  | while von_start < last_lim-1 do | ||
|  |   if isVon(tokens[von_start]) then | ||
|  |     divide_von_from_last() | ||
|  |     got_von = true | ||
|  |     break | ||
|  |   else | ||
|  |     von_start = von_start + 1 | ||
|  |   end | ||
|  | end | ||
|  | if not got_von then -- there is no von name | ||
|  |   while von_start > 1 and find(trailers[von_start - 1], sep_and_not_tie) do | ||
|  |     von_start = von_start - 1 | ||
|  |   end | ||
|  |   von_lim = von_start | ||
|  | end | ||
|  | first_lim = von_start | ||
|  | @  | ||
|  | The last name starts just past the last token, before the first | ||
|  | comma (if there is no comma, there is deemed to be one at the end | ||
|  | of the string), for which there exists a first brace-level-0 letter | ||
|  | (or brace-level-1 special character), and it's in lower case, unless | ||
|  | this last token is also the last token before the comma, in which | ||
|  | case the last name starts with this token (unless this last token is | ||
|  | connected by a [[sep_char]] other than a [[tie]] to the previous token, in | ||
|  | which case the last name starts with as many tokens earlier as are | ||
|  | connected by non[[tie]]s to this last one (except on Tuesdays | ||
|  | $\ldots\,$), although this module never sees such a case).  Note that | ||
|  | if there are any tokens in either the von or last names, then the last | ||
|  | name has at least one, even if it starts with a lower-case letter. | ||
|  | @  | ||
|  | The string separating tokens is reduced to a single ``separator | ||
|  | character.'' | ||
|  | A~comma always trumps other separator characters. | ||
|  | Otherwise, if there's no comma, we take the first character, be it a | ||
|  | separator or a space. | ||
|  | (Patashnik considers that multiple such characters constitute | ||
|  | ``silliness'' on the user's part.)   | ||
|  | <<rewrite [[trailers]] to hold a single separator character each>>= | ||
|  | for i = 1, table.getn(trailers) do | ||
|  |   local s = trailers[i] | ||
|  |   assert(string.len(s) > 0) | ||
|  |   if find(s, ',') then | ||
|  |     trailers[i] = ',' | ||
|  |   else | ||
|  |     trailers[i] = string.sub(s, 1, 1) | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<set fields of name based on [[first_start]] and friends>>= | ||
|  | <<definition of function [[set_name]]>> | ||
|  | set_name(first_start, first_lim, 'ff', 'f') | ||
|  | set_name(von_start,   von_lim,   'vv', 'v') | ||
|  | set_name(von_lim,     last_lim,  'll', 'l') | ||
|  | set_name(last_lim,    jr_lim,    'jj', 'j') | ||
|  | @ | ||
|  | We set long and short forms together; [[ss]]~is the long form and | ||
|  | [[s]]~is the short form. | ||
|  | <<definition of function [[set_name]]>>= | ||
|  | local function set_name(start, lim, long, short) | ||
|  |   if start < lim then | ||
|  |     -- string concatenation is quadratic, but names are short | ||
|  |     <<definition of [[abbrev]], for shortening a token>> | ||
|  |     local ss = tokens[start] | ||
|  |     local s  = abbrev(tokens[start]) | ||
|  |     for i = start + 1, lim - 1 do | ||
|  |       if inter_token then | ||
|  |         ss = ss .. inter_token .. tokens[i] | ||
|  |         s  = s  .. inter_token .. abbrev(tokens[i]) | ||
|  |       else | ||
|  |         local ssep, nnext = trailers[i-1], tokens[i] | ||
|  |         local sep,  next  = ssep,          abbrev(nnext) | ||
|  |         <<possibly adjust [[sep]] and [[ssep]] according to token position and size>> | ||
|  |         ss = ss ..        ssep .. nnext | ||
|  |         s  = s  .. '.' .. sep  .. next | ||
|  |       end | ||
|  |     end | ||
|  |     name[long] = ss | ||
|  |     name[short] = s | ||
|  |   end | ||
|  | end | ||
|  | @ | ||
|  | Here is the default for a character between tokens: | ||
|  | a~tie is the default space character between the last two tokens of | ||
|  | the name part, and between the first two tokens if the first token is | ||
|  | short enough; otherwise, a space is the default. | ||
|  | <<possibly adjust [[sep]] and [[ssep]] according to token position and size>>= | ||
|  | if find(sep, sep_char) then | ||
|  |   -- do nothing; sep is OK | ||
|  | elseif i == lim-1 then | ||
|  |   sep, ssep = '~', '~' | ||
|  | elseif i == start + 1 then | ||
|  |   sep  = text_char_count(s)  < 3 and '~' or ' ' | ||
|  |   ssep = text_char_count(ss) < 3 and '~' or ' ' | ||
|  | else | ||
|  |   sep, ssep = ' ', ' ' | ||
|  | end | ||
|  | @ | ||
|  | The von name starts with the first token satisfying [[isVon]],  | ||
|  | unless that is the last token. | ||
|  | A~``von token'' is simply one that begins with a lower-case | ||
|  | letter---but those damn specials complicate everything. | ||
|  | <<Lua utility functions>>= | ||
|  | local upper_specials = { OE = true, AE = true, AA = true, O = true, L = true } | ||
|  | local lower_specials = { i = true, j = true, oe = true, ae = true, aa = true, | ||
|  |                          o = true, l = true, ss = true } | ||
|  | <<name-parsing utilities>>= | ||
|  | function isVon(s) | ||
|  |   local lower  = find_outside_braces(s, '%l') -- first nonbrace lowercase | ||
|  |   local letter = find_outside_braces(s, '%a') -- first nonbrace letter | ||
|  |   local bs, ebs, command = find_outside_braces(s, '%{%\\(%a+)') -- \xxx | ||
|  |   if lower and lower <= letter and lower <= (bs or lower) then | ||
|  |     return true | ||
|  |   elseif letter and letter <= (bs or letter) then | ||
|  |     return false | ||
|  |   elseif bs then | ||
|  |     if upper_specials[command] then | ||
|  |       return false | ||
|  |     elseif lower_specials[command] then | ||
|  |       return true | ||
|  |     else | ||
|  |       local close_brace = find_outside_braces(s, '%}', ebs+1) | ||
|  |       lower  = find(s, '%l') -- first nonbrace lowercase | ||
|  |       letter = find(s, '%a') -- first nonbrace letter | ||
|  |       return lower and lower <= letter | ||
|  |     end | ||
|  |   else | ||
|  |     return false | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | An abbreviated token is the first letter of a token, except again we | ||
|  | have to deal with the damned specials. | ||
|  | <<definition of [[abbrev]], for shortening a token>>= | ||
|  | local function abbrev(token) | ||
|  |   local first_alpha, _, alpha = find(token, '(%a)') | ||
|  |   local first_brace           = find(token, '%{%\\') | ||
|  |   if first_alpha and first_alpha <= (first_brace or first_alpha) then | ||
|  |     return alpha | ||
|  |   elseif first_brace then | ||
|  |     local i, j, special = find(token, '(%b{})', first_brace) | ||
|  |     if i then | ||
|  |       return special | ||
|  |     else -- unbalanced braces | ||
|  |       return string.sub(token, first_brace) | ||
|  |     end | ||
|  |   else | ||
|  |     return '' | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Formatting names} | ||
|  | 
 | ||
|  | Lacking Lua's string-processing utilities, classic \bibtex\ defines a | ||
|  | way of converting a ``format string'' and a name into a formatted | ||
|  | name. | ||
|  | I~find this formatting technique painful, but I also wanted to preserve | ||
|  | compatibility with existing bibliography styles, so I've implemented | ||
|  | it as accurately as I~can. | ||
|  | 
 | ||
|  | The interface is not quite identical to classic \bibtex; | ||
|  | a style can use [[namesplit]] to split names and then   | ||
|  | [[format_name]] to format a single one,  | ||
|  | or it can throw caution to the winds and call [[format_names]] to | ||
|  | format a whole list of names. | ||
|  | <<exported Lua functions>>= | ||
|  | bst.doc.format_names = "format * name list -> string list # format each name in list" | ||
|  | function bst.format_names(fmt, t) | ||
|  |   local u = { } | ||
|  |   for i = 1, table.getn(t) do | ||
|  |     u[i] = bst.format_name(fmt, t[i]) | ||
|  |   end | ||
|  |   return u | ||
|  | end | ||
|  | @  | ||
|  | A \bibtex\ format string contains its variable elements inside braces. | ||
|  | Thus, we format a name by replacing each braced substring of the | ||
|  | format string. | ||
|  | <<exported Lua functions>>= | ||
|  | do | ||
|  |   local good_keys = { ff = true, vv = true, ll = true, jj = true, | ||
|  |                       f  = true, v  = true, l  = true, j  = true, } | ||
|  | 
 | ||
|  |   bst.doc.format_name = "format * name -> string # format 1 name as in bibtex" | ||
|  |   function bst.format_name(fmt, name) | ||
|  |     local t = type(name) == 'table' and name or parse_name(name) | ||
|  |     -- at most one of the important letters, perhaps doubled, may appear | ||
|  |     local function replace_braced(s) | ||
|  |       local i, j, alpha = find_outside_braces(s, '(%a+)', 2) | ||
|  |       if not i then | ||
|  |         return '' --- can never be printed, but who are we to complain? | ||
|  |       elseif not good_keys[alpha] then | ||
|  |         biberrorf ('The format string %q has an illegal brace-level-1 letter', s) | ||
|  |       elseif find_outside_braces(s, '%a+', j+1) then | ||
|  |         biberrorf ('The format string %q has two sets of brace-level-1 letters', s) | ||
|  |       elseif t[alpha] then | ||
|  |         local k = j + 1 | ||
|  |         local t = t | ||
|  |         <<make [[k]] follow inter-token string, if any, rebuilding [[t]] as needed>> | ||
|  |         local head, tail = string.sub(s, 2, i-1) .. t[alpha], string.sub(s, k, -2) | ||
|  |         <<adjust [[tail]] to account for discretionality of ties, if any>> | ||
|  |         return head .. tail | ||
|  |       else | ||
|  |         return '' | ||
|  |       end | ||
|  |     end | ||
|  |     return (string.gsub(fmt, '%b{}', replace_braced)) | ||
|  |   end | ||
|  | end    | ||
|  | @          | ||
|  | <<make [[k]] follow inter-token string, if any, rebuilding [[t]] as needed>>= | ||
|  | local kk, jj = find(s, '%b{}', k) | ||
|  | if kk and kk == k then | ||
|  |   k = jj + 1 | ||
|  |   if type(name) == 'string' then | ||
|  |     t = parse_name(name, string.sub(s, kk+1, jj-1)) | ||
|  |   else | ||
|  |     error('Style error -- used a pre-parsed name with non-standard inter-token format string') | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | <<adjust [[tail]] to account for discretionality of ties, if any>>= | ||
|  | if find(tail, '%~%~$') then | ||
|  |   tail = string.sub(tail, 1, -2) -- denotes hard tie | ||
|  | elseif find(tail, '%~$') then | ||
|  |   if text_char_count(head) + text_char_count(tail) - 1 >= 3 then | ||
|  |     tail = string.gsub(tail, '%~$', ' ') | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \subsection{Line-wrapping output} | ||
|  | 
 | ||
|  | EXPLAIN THIS INTERFACE!!! | ||
|  | 
 | ||
|  | My [[max_print_line]] appears to be off by one from Oren Patashnik's. | ||
|  | <<exported Lua functions>>= | ||
|  | local min_print_line, max_print_line = 3, 79 | ||
|  | bibtex.hard_max = max_print_line | ||
|  | bibtex.doc.hard_max = 'int # largest line that avoids a forced line break (for wizards)' | ||
|  | bst.doc.writer = "io-handle * int option -> object # result:write(s) buffers and breaks lines" | ||
|  | function bst.writer(out, indent) | ||
|  |   indent = indent or 2 | ||
|  |   assert(indent + 10 < max_print_line) | ||
|  |   indent = string.rep(' ', indent) | ||
|  |   local gsub = string.gsub | ||
|  |   local buf = '' | ||
|  |   local function write(self, ...) | ||
|  |     local s = table.concat { ... } | ||
|  |     local lines = split(s, '\n') | ||
|  |     lines[1] = buf .. lines[1] | ||
|  |     buf = table.remove(lines) | ||
|  |     for i = 1, table.getn(lines) do | ||
|  |       local line = lines[i] | ||
|  |       if not find(line, '^%s+$') then -- no line of just whitespace | ||
|  |         line = gsub(line, '%s+$', '') | ||
|  |         while string.len(line) > max_print_line do | ||
|  |           <<emit initial part of line and reassign>> | ||
|  |         end | ||
|  |         out:write(line, '\n') | ||
|  |       end | ||
|  |     end | ||
|  |   end | ||
|  |   assert(out.write, "object passed to bst.writer does not have a write method") | ||
|  |   return { write = write } | ||
|  | end | ||
|  | <<emit initial part of line and reassign>>= | ||
|  | local last_pre_white, post_white | ||
|  | local i, j, n = 1, 1, string.len(line) | ||
|  | while i and i <= n and i <= max_print_line do | ||
|  |   i, j = find(line, '%s+', i) | ||
|  |   if i and i <= max_print_line + 1 then | ||
|  |     if i > min_print_line then last_pre_white, post_white = i - 1, j + 1 end | ||
|  |     i = j + 1 | ||
|  |   end | ||
|  | end | ||
|  | if last_pre_white then | ||
|  |   out:write(string.sub(line, 1, last_pre_white), '\n') | ||
|  |   if post_white > max_print_line + 2 then | ||
|  |     post_white = max_print_line + 2 -- bug-for-bug compatibility with bibtex | ||
|  |   end | ||
|  |   line = indent .. string.sub(line, post_white) | ||
|  | elseif n < bibtex.hard_max then | ||
|  |   out:write(line, '\n') | ||
|  |   line = '' | ||
|  | else -- ``unbreakable'' | ||
|  |   out:write(string.sub(line, 1, bibtex.hard_max-1), '%\n') | ||
|  |   line = string.sub(line, bibtex.hard_max) | ||
|  | end | ||
|  | @ | ||
|  | <<check constant values for consistency>>= | ||
|  | assert(min_print_line >= 3) | ||
|  | assert(max_print_line > min_print_line) | ||
|  | @  | ||
|  | \subsection{Functions copied from classic \bibtex} | ||
|  | 
 | ||
|  | \paragraph{Adding a period} | ||
|  | Find the last non-[[}]] character, and if it is not a sentence | ||
|  | terminator, add a period.  | ||
|  | <<exported Lua functions>>= | ||
|  | do | ||
|  |   local terminates_sentence = { ["."] = true, ["?"] = true, ["!"] = true } | ||
|  | 
 | ||
|  |   bst.doc.add_period = "string -> string # add period unless already .?!" | ||
|  |   function bst.add_period(s) | ||
|  |     local _, _, last = find(s, '([^%}])%}*$') | ||
|  |     if last and not terminates_sentence[last] then | ||
|  |       return s .. '.' | ||
|  |     else | ||
|  |       return s | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \paragraph{Case-changing} | ||
|  | 
 | ||
|  | Classic \bibtex\ has a [[change.case$]] function, which takes an | ||
|  | argument telling whether to change to lower case, upper case, or | ||
|  | ``title'' case (which has initial letters capitalized). | ||
|  | Because Lua supports first-class functions, it makes more sense just | ||
|  | to export three functions: [[lower]], [[title]], and [[upper]]. | ||
|  | <<exported Lua functions>>=   | ||
|  | do | ||
|  |   bst.doc.lower = "string -> string  # lower case according to bibtex rules" | ||
|  |   bst.doc.upper = "string -> string  # upper case according to bibtex rules" | ||
|  |   bst.doc.title = "string -> string  # title case according to bibtex rules" | ||
|  | 
 | ||
|  |   <<utilities for case conversion>> | ||
|  | 
 | ||
|  |   <<definitions of case-conversion functions>> | ||
|  | end | ||
|  | @  | ||
|  | Case conversion is complicated by the presence of brace-delimited | ||
|  | sequences, especially since there is one set of conventions for a ``special | ||
|  | character'' (brace-delimited sequence beginning with {\TeX} control | ||
|  | sequence) and | ||
|  | another set of conventions for other brace-delimited sequences. | ||
|  | To deal with them, we typically do an ``odd-even split'' on balanced | ||
|  | braces,  | ||
|  | then apply a ``normal'' conversion function to the odd elements and a | ||
|  | ``special'' conversion function to the even elements. | ||
|  | The application is done by [[oeapp]]. | ||
|  | <<utilities for case conversion>>= | ||
|  | local function oeapp(f, g, t) | ||
|  |   for i = 1, table.getn(t), 2 do | ||
|  |     t[i] = f(t[i]) | ||
|  |   end | ||
|  |   for i = 2, table.getn(t), 2 do | ||
|  |     t[i] = g(t[i]) | ||
|  |   end | ||
|  |   return t | ||
|  | end | ||
|  | @ | ||
|  | Upper- and lower-case conversion are easiest. | ||
|  | Non-specials are hit directly with [[string.lower]] or | ||
|  | [[string.upper]]; | ||
|  | for special characters, we use  utility called [[convert_special]]. | ||
|  | <<definitions of case-conversion functions>>= | ||
|  | local lower_special = convert_special(string.lower) | ||
|  | local upper_special = convert_special(string.upper) | ||
|  | 
 | ||
|  | function bst.lower(s) | ||
|  |   return table.concat(oeapp(string.lower, lower_special, brace_split(s))) | ||
|  | end | ||
|  | 
 | ||
|  | function bst.upper(s) | ||
|  |   return table.concat(oeapp(string.upper, upper_special, brace_split(s))) | ||
|  | end | ||
|  | @ | ||
|  | Here is [[convert_special]]. | ||
|  | If a special begins with an alphabetic control sequence, we convert | ||
|  | only elements between control sequences. | ||
|  | If a special begins with a nonalphabetic control sequence, we convert | ||
|  | the whole special as usual. | ||
|  | Finally, if a special does not begin with a control sequence, we leave | ||
|  | it the hell alone. | ||
|  | (This is the convention that allows us to put [[{FORTRAN}]] in a | ||
|  | \bibtex\ entry and be assured that capitalization is not lost.) | ||
|  | <<utilities for case conversion>>= | ||
|  | function convert_special(cvt) | ||
|  |   return function(s) | ||
|  |            if find(s, '^{\\(%a+)') then | ||
|  |              local t = odd_even_split(s, '\\%a+') | ||
|  |              for i = 1, table.getn(t), 2 do | ||
|  |                t[i] = cvt(t[i]) | ||
|  |              end | ||
|  |              return table.concat(t) | ||
|  |            elseif find(s, '^{\\') then | ||
|  |              return cvt(s) | ||
|  |            else | ||
|  |              return s | ||
|  |            end | ||
|  |          end | ||
|  | end | ||
|  | @  | ||
|  | Title conversion doesn't fit so nicely into the framework. | ||
|  | 
 | ||
|  | Function [[lower_later]] lowers all but the first letter of a string. | ||
|  | <<utilities for case conversion>>= | ||
|  | local function lower_later(s) | ||
|  |   return string.sub(s, 1, 1) .. string.lower(string.sub(s, 2)) | ||
|  | end | ||
|  | @  | ||
|  | For title conversion, we don't mess with a token that follows a colon. | ||
|  | Hence, we must maintain [[prev]] and can't use [[convert_special]]. | ||
|  | <<definitions of case-conversion functions>>= | ||
|  | local function title_special(s, prev) | ||
|  |   if find(prev, ':%s+$') then | ||
|  |     return s | ||
|  |   else | ||
|  |     if find(s, '^{\\(%a+)') then | ||
|  |       local t = odd_even_split(s, '\\%a+') | ||
|  |       for i = 1, table.getn(t), 2 do | ||
|  |         local prev = t[i-1] or prev | ||
|  |         if find(prev, ':%s+$') then | ||
|  |           assert(false, 'bugrit') | ||
|  |         else | ||
|  |           t[i] = string.lower(t[i]) | ||
|  |         end | ||
|  |       end | ||
|  |       return table.concat(t) | ||
|  |     elseif find(s, '^{\\') then | ||
|  |       return string.lower(s) | ||
|  |     else | ||
|  |       return s | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | Internal function [[recap]] deals with the damn colons. | ||
|  | <<definitions of case-conversion functions>>= | ||
|  | function bst.title(s) | ||
|  |   local function recap(s, first) | ||
|  |     local parts = odd_even_split(s, '%:%s+') | ||
|  |     parts[1] = first and lower_later(parts[1]) or string.lower(parts[1]) | ||
|  |     for i = (first and 3 or 1), table.getn(parts), 2 do | ||
|  |       parts[i] = lower_later(parts[i]) | ||
|  |     end | ||
|  |     return table.concat(parts) | ||
|  |   end | ||
|  |   local t = brace_split(s) | ||
|  |   for i = 1, table.getn(t), 2 do -- elements outside specials get recapped | ||
|  |     t[i] = recap(t[i], i == 1) | ||
|  |   end | ||
|  |   for i = 2, table.getn(t), 2 do -- specials are, well, special | ||
|  |     local prev = t[i-1] | ||
|  |     if i == 2 and not find(prev, '%S') then prev = ': ' end  | ||
|  |     t[i] = title_special(t[i], prev) | ||
|  |   end | ||
|  |   return table.concat(t) | ||
|  | end | ||
|  | @  | ||
|  | \paragraph{Purification} | ||
|  | 
 | ||
|  | Purification (classic [[purify$]]) involves removing non-alphanumeric | ||
|  | characters. | ||
|  | Each sequence of ``separator'' characters becomes a single space. | ||
|  | <<exported Lua functions>>= | ||
|  | do | ||
|  |   bst.doc.purify = "string -> string # remove nonalphanumeric, non-sep chars" | ||
|  |   local high_alpha = string.char(128) .. '-' .. string.char(255) | ||
|  |   local sep_white_char = '[' .. sep_chars .. '%s]' | ||
|  |   local disappears     = '[^' .. sep_chars .. high_alpha .. '%s%w]' | ||
|  |   local gsub = string.gsub | ||
|  |   local function purify(s) | ||
|  |     return gsub(gsub(s, sep_white_char, ' '), disappears, '') | ||
|  |   end | ||
|  |   -- special characters are purified by removing all non-alphanumerics, | ||
|  |   -- including white space and sep-chars | ||
|  |   local function spurify(s) | ||
|  |     return gsub(s, '[^%w' .. high_alpha .. ']+', '') | ||
|  |   end | ||
|  |   local purify_all_chars = { oe = true, OE = true, ae = true, AE = true, ss = true } | ||
|  |    | ||
|  |   function bst.purify(s) | ||
|  |     local t = brace_split(s) | ||
|  |     for i = 1, table.getn(t) do | ||
|  |       local _, k, cmd = find(t[i], '^{\\(%a+)%s*') | ||
|  |       if k then | ||
|  |         if lower_specials[cmd] or upper_specials[cmd] then | ||
|  |           if not purify_all_chars[cmd] then | ||
|  |             cmd = string.sub(cmd, 1, 1) | ||
|  |           end | ||
|  |           t[i] = cmd .. spurify(string.sub(t[i], k+1)) | ||
|  |         else | ||
|  |           t[i] = spurify(string.sub(t[i], k+1)) | ||
|  |         end | ||
|  |       elseif find(t[i], '^{\\') then | ||
|  |         t[i] = spurify(t[i]) | ||
|  |       else | ||
|  |         t[i] = purify(t[i]) | ||
|  |       end | ||
|  |     end | ||
|  |     return table.concat(t) | ||
|  |   end | ||
|  | end   | ||
|  | @  | ||
|  | \paragraph{Text prefix} | ||
|  | 
 | ||
|  | Function [[text_prefix]] (classic [[text.prefix$]]) takes an initial | ||
|  | substring of a string, with the proviso that a \bibtex\ ``special | ||
|  | character'' sequence counts as a single character. | ||
|  | <<exported Lua functions>>= | ||
|  | bst.doc.text_prefix = "string * int -> string  # take first n chars with special == 1" | ||
|  | function bst.text_prefix(s, n) | ||
|  |   local t = brace_split(s) | ||
|  |   local answer, rem = '', n | ||
|  |   for i = 1, table.getn(t), 2 do | ||
|  |     answer = answer .. string.sub(t[i], 1, rem) | ||
|  |     rem = rem - string.len(t[i]) | ||
|  |     if rem <= 0 then return answer end | ||
|  |     if find(t[i+1], '^{\\') then | ||
|  |       answer = answer .. t[i+1] | ||
|  |       rem = rem - 1 | ||
|  |     else | ||
|  |       <<take up to [[rem]] characters from [[t[i+1]]], not counting braces>> | ||
|  |     end | ||
|  |   end | ||
|  |   return answer | ||
|  | end | ||
|  | <<take up to [[rem]] characters from [[t[i+1]]], not counting braces>>= | ||
|  | local s = t[i+1] | ||
|  | local braces = 0 | ||
|  | local sub = string.sub | ||
|  | for i = 1, string.len(s) do | ||
|  |   local c = sub(s, i, i) | ||
|  |   if c == '{' then | ||
|  |     braces = braces + 1 | ||
|  |   elseif c == '}' then | ||
|  |     braces = braces + 1 | ||
|  |   else | ||
|  |     rem = rem - 1 | ||
|  |     if rem == 0 then | ||
|  |       return answer .. string.sub(s, 1, i) .. string.rep('}', braces) | ||
|  |     end | ||
|  |   end | ||
|  | end | ||
|  | answer = answer .. s | ||
|  | @  | ||
|  | \paragraph{Emptiness test} | ||
|  | 
 | ||
|  | Function [[empty]] (classic [[empty$]]) tells if a value is empty; | ||
|  | i.e., it is missing (nil) or it is only white space. | ||
|  | <<exported Lua functions>>= | ||
|  | bst.doc.empty = "string option -> bool # is string there and holding nonspace?" | ||
|  | function bst.empty(s) | ||
|  |   return s == nil or not find(s, '%S') | ||
|  | end | ||
|  | @  | ||
|  | @  | ||
|  | \subsection{Other utilities} | ||
|  | 
 | ||
|  | \paragraph{A stable sort} | ||
|  | 
 | ||
|  | Function [[bst.sort]] is like [[table.sort]] only stable. | ||
|  | It is needed because classic \bibtex\ uses a stable sort. | ||
|  | Its interface is the same as [[table.sort]]. | ||
|  | <<exported Lua functions>>= | ||
|  | bst.doc.sort = 'value list * compare option # like table.sort, but stable' | ||
|  | function bst.sort(t, lt) | ||
|  |   lt = lt or function(x, y) return x < y end | ||
|  |   local pos = { } --- position of each element in original table | ||
|  |   for i = 1, table.getn(t) do pos[t[i]] = i end | ||
|  |   local function nlt(x, y) | ||
|  |     if lt(x, y) then | ||
|  |       return true | ||
|  |     elseif lt(y, x) then | ||
|  |       return false | ||
|  |     else -- elements look equal | ||
|  |       return pos[x] < pos[y] | ||
|  |     end | ||
|  |   end | ||
|  |   return table.sort(t, nlt) | ||
|  | end | ||
|  | bst.doc.sort = 'value list * compare option -> unit  # stable sort' | ||
|  | @  | ||
|  | \paragraph{The standard months} | ||
|  | 
 | ||
|  | Every style is required to recognize the months, so we make it easy to | ||
|  | create a fresh table with either full or abbreviated months. | ||
|  | <<exported Lua functions>>= | ||
|  | bst.doc.months = "string option -> table # macros table containing months" | ||
|  | function bst.months(what) | ||
|  |   local m = { | ||
|  |     jan = "January", feb = "February", mar = "March", apr = "April", | ||
|  |     may = "May", jun = "June", jul = "July", aug = "August", | ||
|  |     sep = "September", oct = "October", nov = "November", dec = "December" } | ||
|  |   if what == 'short' or what == 3 then | ||
|  |     for k, v in pairs(m) do | ||
|  |       m[k] = string.sub(v, 1, 3) | ||
|  |     end | ||
|  |   end | ||
|  |   return m | ||
|  | end | ||
|  | @  | ||
|  | \paragraph{Comma-separated lists} | ||
|  | 
 | ||
|  | The function [[commafy]] takes a list and inserts commas and | ||
|  | [[and]] (or [[or]]) using American conventions. | ||
|  | For example, | ||
|  | \begin{quote} | ||
|  | [[commafy { 'Graham', 'Knuth', 'Patashnik' }]] | ||
|  | \end{quote} | ||
|  | returns [['Graham, Knuth, and Patashnik']], | ||
|  | but | ||
|  | \begin{quote} | ||
|  | [[commafy { 'Knuth', 'Plass' }]]  | ||
|  | \end{quote} | ||
|  | returns [['Knuth and Plass']]. | ||
|  | <<exported Lua functions>>= | ||
|  | bst.doc.commafy = "string list -> string # concat separated by commas, and" | ||
|  | function bst.commafy(t, andword) | ||
|  |   andword = andword or 'and' | ||
|  |   local n = table.getn(t) | ||
|  |   if n == 1 then | ||
|  |     return t[1] | ||
|  |   elseif n == 2 then | ||
|  |     return t[1] .. ' ' .. andword .. ' ' .. t[2] | ||
|  |   else | ||
|  |     local last = t[n] | ||
|  |     t[n] = andword .. ' ' .. t[n] | ||
|  |     local answer = table.concat(t, ', ') | ||
|  |     t[n] = last | ||
|  |     return answer | ||
|  |   end | ||
|  | end | ||
|  | @  | ||
|  | \section{Testing and so on} | ||
|  | 
 | ||
|  | Here are a couple of test functions I used during development that I | ||
|  | thought might be worth keeping around. | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.cat = 'string -> unit # emit the named bib file in bib format' | ||
|  | function bibtex.cat(bib) | ||
|  |   local rdr = bibtex.open(bib, bst.months()) | ||
|  |   if not rdr then | ||
|  |     rdr = assert(bibtex.open(assert(bibtex.bibpath(bib)), bst.months())) | ||
|  |   end | ||
|  |   for type, key, fields in entries(rdr) do | ||
|  |     if type == nil then | ||
|  |       break | ||
|  |     elseif not type then | ||
|  |       io.stderr:write('Error on key ', key, '\n') | ||
|  |     else | ||
|  |       emit_tkf.bib(io.stdout, type, key, fields) | ||
|  |     end | ||
|  |   end | ||
|  |   bibtex.close(rdr) | ||
|  | end | ||
|  | @  | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.count = 'string list -> unit # take list of bibs and print number of entries' | ||
|  | function bibtex.count(argv) | ||
|  |   local bibs = { } | ||
|  |   local macros = { } | ||
|  |   local n = 0 | ||
|  |   <<make [[bibs]] the list of pathnames implied by [[argv]]>> | ||
|  |   local function warn() end | ||
|  |   for _, bib in ipairs(bibs) do | ||
|  |     local rdr = bibtex.open(bib, macros) | ||
|  |     for type, key, fields in entries(rdr) do | ||
|  |       if type == nil then | ||
|  |         break | ||
|  |       elseif type then | ||
|  |         n = n + 1 | ||
|  |       end | ||
|  |     end | ||
|  |     rdr:close() | ||
|  |   end | ||
|  |   printf("%d\n", n) | ||
|  | end | ||
|  | @  | ||
|  | <<exported Lua functions>>= | ||
|  | bibtex.doc.all_entries = "bibname * macro-table -> preamble * citation list" | ||
|  | function bibtex.all_entries(bib, macros) | ||
|  |   macros = macros or bst.months() | ||
|  |   warn = warn or emit_warning | ||
|  |   local rdr = bibtex.open(bib, macros, warn) | ||
|  |   if not rdr then | ||
|  |     rdr = assert(bibtex.open(assert(bibtex.bibpath(bib)), macros, warn), | ||
|  |                  "could not open bib file " .. bib) | ||
|  |   end | ||
|  |   local cs = { } | ||
|  |   local seen = { } | ||
|  |   for type, key, fields in entries(rdr) do | ||
|  |     if type == nil then | ||
|  |       break | ||
|  |     elseif not type then | ||
|  |       io.stderr:write(key, '\n') | ||
|  |     elseif not seen[key] then | ||
|  |       seen[key] = true | ||
|  |       table.insert(cs, { type = type, key = key, fields = fields, file = bib, | ||
|  |                          line = rdr.entry_line }) | ||
|  |     end | ||
|  |   end | ||
|  |   local p = assert(rdr.preamble) | ||
|  |   rdr:close() | ||
|  |   return p, cs | ||
|  | end | ||
|  | @  | ||
|  | \section{Laundry list} | ||
|  | 
 | ||
|  | THINGS TO DO: | ||
|  | \begin{itemize} | ||
|  | \item TRANSITION THE C~CODE TO LUA NATIVE ERROR HANDLING ([[luaL_error]] and [[pcall]]) | ||
|  | \item NO WARNING FOR DUPLICATE FIELDS NOT DEFINED IN .BST? | ||
|  | \item STANDARD WARNING FOR REPEATED ENTRY? | ||
|  | \item  | ||
|  | NOT ENFORCED: An entry type must be | ||
|  | defined in the \texttt{.bst} file if this entry is to be included in the | ||
|  | reference list. | ||
|  | \item  | ||
|  | THE WHOLE BST-SEARCH THING NEEDS MORE CARE. | ||
|  | 
 | ||
|  |        BibTeX searches the directories in the path defined  by  the  BSTINPUTS | ||
|  |        environment  variable  for .bst files. If BSTINPUTS is not set, it uses | ||
|  |        the system default.  For .bib files, it uses the BIBINPUTS  environment | ||
|  |        variable  if  that  is  set, otherwise the default.  See tex(1) for the | ||
|  |        details of the searching. | ||
|  | 
 | ||
|  |        If the environment variable TEXMFOUTPUT is set, BibTeX attempts to  put | ||
|  |        its output files in it, if they cannot be put in the current directory. | ||
|  |        Again, see tex(1).  No special searching is done for the .aux file. | ||
|  | \item | ||
|  | RATIONALIZE ERROR MACHINERY WITH WARNING, ERROR, AND FATAL CASES -- | ||
|  | AND COUNTS. | ||
|  | \item | ||
|  | Here are some things that \bibtex\ does that \nbibtex\ should do: | ||
|  | \begin{enumerate} | ||
|  | \item | ||
|  | Writes a log file | ||
|  | \item | ||
|  | Counts warnings, or if there is an error, counts errors instead | ||
|  | \end{enumerate} | ||
|  | \end{itemize} | ||
|  | 
 | ||
|  | 
 | ||
|  | \end{document} | ||
|  | @ |