1 | .\" Copyright (c) 1992, 1993, 1994 Henry Spencer. |
---|
2 | .\" Copyright (c) 1992, 1993, 1994 |
---|
3 | .\" The Regents of the University of California. All rights reserved. |
---|
4 | .\" |
---|
5 | .\" This code is derived from software contributed to Berkeley by |
---|
6 | .\" Henry Spencer. |
---|
7 | .\" |
---|
8 | .\" Redistribution and use in source and binary forms, with or without |
---|
9 | .\" modification, are permitted provided that the following conditions |
---|
10 | .\" are met: |
---|
11 | .\" 1. Redistributions of source code must retain the above copyright |
---|
12 | .\" notice, this list of conditions and the following disclaimer. |
---|
13 | .\" 2. Redistributions in binary form must reproduce the above copyright |
---|
14 | .\" notice, this list of conditions and the following disclaimer in the |
---|
15 | .\" documentation and/or other materials provided with the distribution. |
---|
16 | .\" 3. All advertising materials mentioning features or use of this software |
---|
17 | .\" must display the following acknowledgement: |
---|
18 | .\" This product includes software developed by the University of |
---|
19 | .\" California, Berkeley and its contributors. |
---|
20 | .\" 4. Neither the name of the University nor the names of its contributors |
---|
21 | .\" may be used to endorse or promote products derived from this software |
---|
22 | .\" without specific prior written permission. |
---|
23 | .\" |
---|
24 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND |
---|
25 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
---|
26 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
---|
27 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE |
---|
28 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
---|
29 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS |
---|
30 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
---|
31 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
---|
32 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
---|
33 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
---|
34 | .\" SUCH DAMAGE. |
---|
35 | .\" |
---|
36 | .\" @(#)regex.3 8.4 (Berkeley) 3/20/94 |
---|
37 | .\" $FreeBSD: src/lib/libc/regex/regex.3,v 1.9 2001/10/01 16:08:58 ru Exp $ |
---|
38 | .\" |
---|
39 | .Dd March 20, 1994 |
---|
40 | .Dt REGEX 3 |
---|
41 | .Os |
---|
42 | .Sh NAME |
---|
43 | .Nm regcomp , |
---|
44 | .Nm regexec , |
---|
45 | .Nm regerror , |
---|
46 | .Nm regfree |
---|
47 | .Nd regular-expression library |
---|
48 | .Sh LIBRARY |
---|
49 | .Lb libc |
---|
50 | .Sh SYNOPSIS |
---|
51 | .In sys/types.h |
---|
52 | .In regex.h |
---|
53 | .Ft int |
---|
54 | .Fn regcomp "regex_t *restrict preg" "const char *_restrictpattern" "int cflags" |
---|
55 | .Ft int |
---|
56 | .Fo regexec |
---|
57 | .Fa "const regex_t *_restrict preg" "const char *_restrict string" |
---|
58 | .Fa "size_t nmatch" "regmatch_t pmatch[_restrict]" "int eflags" |
---|
59 | .Fc |
---|
60 | .Ft size_t |
---|
61 | .Fo regerror |
---|
62 | .Fa "int errcode" "const regex_t *_restrict preg" |
---|
63 | .Fa "char *_restrict errbuf" "size_t errbuf_size" |
---|
64 | .Fc |
---|
65 | .Ft void |
---|
66 | .Fn regfree "regex_t *preg" |
---|
67 | .Sh DESCRIPTION |
---|
68 | These routines implement |
---|
69 | .St -p1003.2 |
---|
70 | regular expressions |
---|
71 | .Pq Do RE Dc Ns s ; |
---|
72 | see |
---|
73 | .Xr re_format 7 . |
---|
74 | .Fn Regcomp |
---|
75 | compiles an RE written as a string into an internal form, |
---|
76 | .Fn regexec |
---|
77 | matches that internal form against a string and reports results, |
---|
78 | .Fn regerror |
---|
79 | transforms error codes from either into human-readable messages, |
---|
80 | and |
---|
81 | .Fn regfree |
---|
82 | frees any dynamically-allocated storage used by the internal form |
---|
83 | of an RE. |
---|
84 | .Pp |
---|
85 | The header |
---|
86 | .Aq Pa regex.h |
---|
87 | declares two structure types, |
---|
88 | .Ft regex_t |
---|
89 | and |
---|
90 | .Ft regmatch_t , |
---|
91 | the former for compiled internal forms and the latter for match reporting. |
---|
92 | It also declares the four functions, |
---|
93 | a type |
---|
94 | .Ft regoff_t , |
---|
95 | and a number of constants with names starting with |
---|
96 | .Dq Dv REG_ . |
---|
97 | .Pp |
---|
98 | .Fn Regcomp |
---|
99 | compiles the regular expression contained in the |
---|
100 | .Fa pattern |
---|
101 | string, |
---|
102 | subject to the flags in |
---|
103 | .Fa cflags , |
---|
104 | and places the results in the |
---|
105 | .Ft regex_t |
---|
106 | structure pointed to by |
---|
107 | .Fa preg . |
---|
108 | .Fa Cflags |
---|
109 | is the bitwise OR of zero or more of the following flags: |
---|
110 | .Bl -tag -width REG_EXTENDED |
---|
111 | .It Dv REG_EXTENDED |
---|
112 | Compile modern |
---|
113 | .Pq Dq extended |
---|
114 | REs, |
---|
115 | rather than the obsolete |
---|
116 | .Pq Dq basic |
---|
117 | REs that |
---|
118 | are the default. |
---|
119 | .It Dv REG_BASIC |
---|
120 | This is a synonym for 0, |
---|
121 | provided as a counterpart to |
---|
122 | .Dv REG_EXTENDED |
---|
123 | to improve readability. |
---|
124 | .It Dv REG_NOSPEC |
---|
125 | Compile with recognition of all special characters turned off. |
---|
126 | All characters are thus considered ordinary, |
---|
127 | so the |
---|
128 | .Dq RE |
---|
129 | is a literal string. |
---|
130 | This is an extension, |
---|
131 | compatible with but not specified by |
---|
132 | .St -p1003.2 , |
---|
133 | and should be used with |
---|
134 | caution in software intended to be portable to other systems. |
---|
135 | .Dv REG_EXTENDED |
---|
136 | and |
---|
137 | .Dv REG_NOSPEC |
---|
138 | may not be used |
---|
139 | in the same call to |
---|
140 | .Fn regcomp . |
---|
141 | .It Dv REG_ICASE |
---|
142 | Compile for matching that ignores upper/lower case distinctions. |
---|
143 | See |
---|
144 | .Xr re_format 7 . |
---|
145 | .It Dv REG_NOSUB |
---|
146 | Compile for matching that need only report success or failure, |
---|
147 | not what was matched. |
---|
148 | .It Dv REG_NEWLINE |
---|
149 | Compile for newline-sensitive matching. |
---|
150 | By default, newline is a completely ordinary character with no special |
---|
151 | meaning in either REs or strings. |
---|
152 | With this flag, |
---|
153 | .Ql [^ |
---|
154 | bracket expressions and |
---|
155 | .Ql .\& |
---|
156 | never match newline, |
---|
157 | a |
---|
158 | .Ql ^\& |
---|
159 | anchor matches the null string after any newline in the string |
---|
160 | in addition to its normal function, |
---|
161 | and the |
---|
162 | .Ql $\& |
---|
163 | anchor matches the null string before any newline in the |
---|
164 | string in addition to its normal function. |
---|
165 | .It Dv REG_PEND |
---|
166 | The regular expression ends, |
---|
167 | not at the first NUL, |
---|
168 | but just before the character pointed to by the |
---|
169 | .Va re_endp |
---|
170 | member of the structure pointed to by |
---|
171 | .Fa preg . |
---|
172 | The |
---|
173 | .Va re_endp |
---|
174 | member is of type |
---|
175 | .Ft "const char *" . |
---|
176 | This flag permits inclusion of NULs in the RE; |
---|
177 | they are considered ordinary characters. |
---|
178 | This is an extension, |
---|
179 | compatible with but not specified by |
---|
180 | .St -p1003.2 , |
---|
181 | and should be used with |
---|
182 | caution in software intended to be portable to other systems. |
---|
183 | .El |
---|
184 | .Pp |
---|
185 | When successful, |
---|
186 | .Fn regcomp |
---|
187 | returns 0 and fills in the structure pointed to by |
---|
188 | .Fa preg . |
---|
189 | One member of that structure |
---|
190 | (other than |
---|
191 | .Va re_endp ) |
---|
192 | is publicized: |
---|
193 | .Va re_nsub , |
---|
194 | of type |
---|
195 | .Ft size_t , |
---|
196 | contains the number of parenthesized subexpressions within the RE |
---|
197 | (except that the value of this member is undefined if the |
---|
198 | .Dv REG_NOSUB |
---|
199 | flag was used). |
---|
200 | If |
---|
201 | .Fn regcomp |
---|
202 | fails, it returns a non-zero error code; |
---|
203 | see |
---|
204 | .Sx DIAGNOSTICS . |
---|
205 | .Pp |
---|
206 | .Fn Regexec |
---|
207 | matches the compiled RE pointed to by |
---|
208 | .Fa preg |
---|
209 | against the |
---|
210 | .Fa string , |
---|
211 | subject to the flags in |
---|
212 | .Fa eflags , |
---|
213 | and reports results using |
---|
214 | .Fa nmatch , |
---|
215 | .Fa pmatch , |
---|
216 | and the returned value. |
---|
217 | The RE must have been compiled by a previous invocation of |
---|
218 | .Fn regcomp . |
---|
219 | The compiled form is not altered during execution of |
---|
220 | .Fn regexec , |
---|
221 | so a single compiled RE can be used simultaneously by multiple threads. |
---|
222 | .Pp |
---|
223 | By default, |
---|
224 | the NUL-terminated string pointed to by |
---|
225 | .Fa string |
---|
226 | is considered to be the text of an entire line, minus any terminating |
---|
227 | newline. |
---|
228 | The |
---|
229 | .Fa eflags |
---|
230 | argument is the bitwise OR of zero or more of the following flags: |
---|
231 | .Bl -tag -width REG_STARTEND |
---|
232 | .It Dv REG_NOTBOL |
---|
233 | The first character of |
---|
234 | the string |
---|
235 | is not the beginning of a line, so the |
---|
236 | .Ql ^\& |
---|
237 | anchor should not match before it. |
---|
238 | This does not affect the behavior of newlines under |
---|
239 | .Dv REG_NEWLINE . |
---|
240 | .It Dv REG_NOTEOL |
---|
241 | The NUL terminating |
---|
242 | the string |
---|
243 | does not end a line, so the |
---|
244 | .Ql $\& |
---|
245 | anchor should not match before it. |
---|
246 | This does not affect the behavior of newlines under |
---|
247 | .Dv REG_NEWLINE . |
---|
248 | .It Dv REG_STARTEND |
---|
249 | The string is considered to start at |
---|
250 | .Fa string |
---|
251 | + |
---|
252 | .Fa pmatch Ns [0]. Ns Va rm_so |
---|
253 | and to have a terminating NUL located at |
---|
254 | .Fa string |
---|
255 | + |
---|
256 | .Fa pmatch Ns [0]. Ns Va rm_eo |
---|
257 | (there need not actually be a NUL at that location), |
---|
258 | regardless of the value of |
---|
259 | .Fa nmatch . |
---|
260 | See below for the definition of |
---|
261 | .Fa pmatch |
---|
262 | and |
---|
263 | .Fa nmatch . |
---|
264 | This is an extension, |
---|
265 | compatible with but not specified by |
---|
266 | .St -p1003.2 , |
---|
267 | and should be used with |
---|
268 | caution in software intended to be portable to other systems. |
---|
269 | Note that a non-zero |
---|
270 | .Va rm_so |
---|
271 | does not imply |
---|
272 | .Dv REG_NOTBOL ; |
---|
273 | .Dv REG_STARTEND |
---|
274 | affects only the location of the string, |
---|
275 | not how it is matched. |
---|
276 | .El |
---|
277 | .Pp |
---|
278 | See |
---|
279 | .Xr re_format 7 |
---|
280 | for a discussion of what is matched in situations where an RE or a |
---|
281 | portion thereof could match any of several substrings of |
---|
282 | .Fa string . |
---|
283 | .Pp |
---|
284 | Normally, |
---|
285 | .Fn regexec |
---|
286 | returns 0 for success and the non-zero code |
---|
287 | .Dv REG_NOMATCH |
---|
288 | for failure. |
---|
289 | Other non-zero error codes may be returned in exceptional situations; |
---|
290 | see |
---|
291 | .Sx DIAGNOSTICS . |
---|
292 | .Pp |
---|
293 | If |
---|
294 | .Dv REG_NOSUB |
---|
295 | was specified in the compilation of the RE, |
---|
296 | or if |
---|
297 | .Fa nmatch |
---|
298 | is 0, |
---|
299 | .Fn regexec |
---|
300 | ignores the |
---|
301 | .Fa pmatch |
---|
302 | argument (but see below for the case where |
---|
303 | .Dv REG_STARTEND |
---|
304 | is specified). |
---|
305 | Otherwise, |
---|
306 | .Fa pmatch |
---|
307 | points to an array of |
---|
308 | .Fa nmatch |
---|
309 | structures of type |
---|
310 | .Ft regmatch_t . |
---|
311 | Such a structure has at least the members |
---|
312 | .Va rm_so |
---|
313 | and |
---|
314 | .Va rm_eo , |
---|
315 | both of type |
---|
316 | .Ft regoff_t |
---|
317 | (a signed arithmetic type at least as large as an |
---|
318 | .Ft off_t |
---|
319 | and a |
---|
320 | .Ft ssize_t ) , |
---|
321 | containing respectively the offset of the first character of a substring |
---|
322 | and the offset of the first character after the end of the substring. |
---|
323 | Offsets are measured from the beginning of the |
---|
324 | .Fa string |
---|
325 | argument given to |
---|
326 | .Fn regexec . |
---|
327 | An empty substring is denoted by equal offsets, |
---|
328 | both indicating the character following the empty substring. |
---|
329 | .Pp |
---|
330 | The 0th member of the |
---|
331 | .Fa pmatch |
---|
332 | array is filled in to indicate what substring of |
---|
333 | .Fa string |
---|
334 | was matched by the entire RE. |
---|
335 | Remaining members report what substring was matched by parenthesized |
---|
336 | subexpressions within the RE; |
---|
337 | member |
---|
338 | .Va i |
---|
339 | reports subexpression |
---|
340 | .Va i , |
---|
341 | with subexpressions counted (starting at 1) by the order of their opening |
---|
342 | parentheses in the RE, left to right. |
---|
343 | Unused entries in the array (corresponding either to subexpressions that |
---|
344 | did not participate in the match at all, or to subexpressions that do not |
---|
345 | exist in the RE (that is, |
---|
346 | .Va i |
---|
347 | > |
---|
348 | .Fa preg Ns -> Ns Va re_nsub ) ) |
---|
349 | have both |
---|
350 | .Va rm_so |
---|
351 | and |
---|
352 | .Va rm_eo |
---|
353 | set to -1. |
---|
354 | If a subexpression participated in the match several times, |
---|
355 | the reported substring is the last one it matched. |
---|
356 | (Note, as an example in particular, that when the RE |
---|
357 | .Ql "(b*)+" |
---|
358 | matches |
---|
359 | .Ql bbb , |
---|
360 | the parenthesized subexpression matches each of the three |
---|
361 | .So Li b Sc Ns s |
---|
362 | and then |
---|
363 | an infinite number of empty strings following the last |
---|
364 | .Ql b , |
---|
365 | so the reported substring is one of the empties.) |
---|
366 | .Pp |
---|
367 | If |
---|
368 | .Dv REG_STARTEND |
---|
369 | is specified, |
---|
370 | .Fa pmatch |
---|
371 | must point to at least one |
---|
372 | .Ft regmatch_t |
---|
373 | (even if |
---|
374 | .Fa nmatch |
---|
375 | is 0 or |
---|
376 | .Dv REG_NOSUB |
---|
377 | was specified), |
---|
378 | to hold the input offsets for |
---|
379 | .Dv REG_STARTEND . |
---|
380 | Use for output is still entirely controlled by |
---|
381 | .Fa nmatch ; |
---|
382 | if |
---|
383 | .Fa nmatch |
---|
384 | is 0 or |
---|
385 | .Dv REG_NOSUB |
---|
386 | was specified, |
---|
387 | the value of |
---|
388 | .Fa pmatch Ns [0] |
---|
389 | will not be changed by a successful |
---|
390 | .Fn regexec . |
---|
391 | .Pp |
---|
392 | .Fn Regerror |
---|
393 | maps a non-zero |
---|
394 | .Fa errcode |
---|
395 | from either |
---|
396 | .Fn regcomp |
---|
397 | or |
---|
398 | .Fn regexec |
---|
399 | to a human-readable, printable message. |
---|
400 | If |
---|
401 | .Fa preg |
---|
402 | is |
---|
403 | .No non\- Ns Dv NULL , |
---|
404 | the error code should have arisen from use of |
---|
405 | the |
---|
406 | .Ft regex_t |
---|
407 | pointed to by |
---|
408 | .Fa preg , |
---|
409 | and if the error code came from |
---|
410 | .Fn regcomp , |
---|
411 | it should have been the result from the most recent |
---|
412 | .Fn regcomp |
---|
413 | using that |
---|
414 | .Ft regex_t . |
---|
415 | .No ( Fn Regerror |
---|
416 | may be able to supply a more detailed message using information |
---|
417 | from the |
---|
418 | .Ft regex_t . ) |
---|
419 | .Fn Regerror |
---|
420 | places the NUL-terminated message into the buffer pointed to by |
---|
421 | .Fa errbuf , |
---|
422 | limiting the length (including the NUL) to at most |
---|
423 | .Fa errbuf_size |
---|
424 | bytes. |
---|
425 | If the whole message won't fit, |
---|
426 | as much of it as will fit before the terminating NUL is supplied. |
---|
427 | In any case, |
---|
428 | the returned value is the size of buffer needed to hold the whole |
---|
429 | message (including terminating NUL). |
---|
430 | If |
---|
431 | .Fa errbuf_size |
---|
432 | is 0, |
---|
433 | .Fa errbuf |
---|
434 | is ignored but the return value is still correct. |
---|
435 | .Pp |
---|
436 | If the |
---|
437 | .Fa errcode |
---|
438 | given to |
---|
439 | .Fn regerror |
---|
440 | is first ORed with |
---|
441 | .Dv REG_ITOA , |
---|
442 | the |
---|
443 | .Dq message |
---|
444 | that results is the printable name of the error code, |
---|
445 | e.g.\& |
---|
446 | .Dq Dv REG_NOMATCH , |
---|
447 | rather than an explanation thereof. |
---|
448 | If |
---|
449 | .Fa errcode |
---|
450 | is |
---|
451 | .Dv REG_ATOI , |
---|
452 | then |
---|
453 | .Fa preg |
---|
454 | shall be |
---|
455 | .No non\- Ns Dv NULL |
---|
456 | and the |
---|
457 | .Va re_endp |
---|
458 | member of the structure it points to |
---|
459 | must point to the printable name of an error code; |
---|
460 | in this case, the result in |
---|
461 | .Fa errbuf |
---|
462 | is the decimal digits of |
---|
463 | the numeric value of the error code |
---|
464 | (0 if the name is not recognized). |
---|
465 | .Dv REG_ITOA |
---|
466 | and |
---|
467 | .Dv REG_ATOI |
---|
468 | are intended primarily as debugging facilities; |
---|
469 | they are extensions, |
---|
470 | compatible with but not specified by |
---|
471 | .St -p1003.2 , |
---|
472 | and should be used with |
---|
473 | caution in software intended to be portable to other systems. |
---|
474 | Be warned also that they are considered experimental and changes are possible. |
---|
475 | .Pp |
---|
476 | .Fn Regfree |
---|
477 | frees any dynamically-allocated storage associated with the compiled RE |
---|
478 | pointed to by |
---|
479 | .Fa preg . |
---|
480 | The remaining |
---|
481 | .Ft regex_t |
---|
482 | is no longer a valid compiled RE |
---|
483 | and the effect of supplying it to |
---|
484 | .Fn regexec |
---|
485 | or |
---|
486 | .Fn regerror |
---|
487 | is undefined. |
---|
488 | .Pp |
---|
489 | None of these functions references global variables except for tables |
---|
490 | of constants; |
---|
491 | all are safe for use from multiple threads if the arguments are safe. |
---|
492 | .Sh IMPLEMENTATION CHOICES |
---|
493 | There are a number of decisions that |
---|
494 | .St -p1003.2 |
---|
495 | leaves up to the implementor, |
---|
496 | either by explicitly saying |
---|
497 | .Dq undefined |
---|
498 | or by virtue of them being |
---|
499 | forbidden by the RE grammar. |
---|
500 | This implementation treats them as follows. |
---|
501 | .Pp |
---|
502 | See |
---|
503 | .Xr re_format 7 |
---|
504 | for a discussion of the definition of case-independent matching. |
---|
505 | .Pp |
---|
506 | There is no particular limit on the length of REs, |
---|
507 | except insofar as memory is limited. |
---|
508 | Memory usage is approximately linear in RE size, and largely insensitive |
---|
509 | to RE complexity, except for bounded repetitions. |
---|
510 | See |
---|
511 | .Sx BUGS |
---|
512 | for one short RE using them |
---|
513 | that will run almost any system out of memory. |
---|
514 | .Pp |
---|
515 | A backslashed character other than one specifically given a magic meaning |
---|
516 | by |
---|
517 | .St -p1003.2 |
---|
518 | (such magic meanings occur only in obsolete |
---|
519 | .Bq Dq basic |
---|
520 | REs) |
---|
521 | is taken as an ordinary character. |
---|
522 | .Pp |
---|
523 | Any unmatched |
---|
524 | .Ql [\& |
---|
525 | is a |
---|
526 | .Dv REG_EBRACK |
---|
527 | error. |
---|
528 | .Pp |
---|
529 | Equivalence classes cannot begin or end bracket-expression ranges. |
---|
530 | The endpoint of one range cannot begin another. |
---|
531 | .Pp |
---|
532 | .Dv RE_DUP_MAX , |
---|
533 | the limit on repetition counts in bounded repetitions, is 255. |
---|
534 | .Pp |
---|
535 | A repetition operator |
---|
536 | .Ql ( ?\& , |
---|
537 | .Ql *\& , |
---|
538 | .Ql +\& , |
---|
539 | or bounds) |
---|
540 | cannot follow another |
---|
541 | repetition operator. |
---|
542 | A repetition operator cannot begin an expression or subexpression |
---|
543 | or follow |
---|
544 | .Ql ^\& |
---|
545 | or |
---|
546 | .Ql |\& . |
---|
547 | .Pp |
---|
548 | .Ql |\& |
---|
549 | cannot appear first or last in a (sub)expression or after another |
---|
550 | .Ql |\& , |
---|
551 | i.e. an operand of |
---|
552 | .Ql |\& |
---|
553 | cannot be an empty subexpression. |
---|
554 | An empty parenthesized subexpression, |
---|
555 | .Ql "()" , |
---|
556 | is legal and matches an |
---|
557 | empty (sub)string. |
---|
558 | An empty string is not a legal RE. |
---|
559 | .Pp |
---|
560 | A |
---|
561 | .Ql {\& |
---|
562 | followed by a digit is considered the beginning of bounds for a |
---|
563 | bounded repetition, which must then follow the syntax for bounds. |
---|
564 | A |
---|
565 | .Ql {\& |
---|
566 | .Em not |
---|
567 | followed by a digit is considered an ordinary character. |
---|
568 | .Pp |
---|
569 | .Ql ^\& |
---|
570 | and |
---|
571 | .Ql $\& |
---|
572 | beginning and ending subexpressions in obsolete |
---|
573 | .Pq Dq basic |
---|
574 | REs are anchors, not ordinary characters. |
---|
575 | .Sh SEE ALSO |
---|
576 | .Xr grep 1 , |
---|
577 | .Xr re_format 7 |
---|
578 | .Pp |
---|
579 | .St -p1003.2 , |
---|
580 | sections 2.8 (Regular Expression Notation) |
---|
581 | and |
---|
582 | B.5 (C Binding for Regular Expression Matching). |
---|
583 | .Sh DIAGNOSTICS |
---|
584 | Non-zero error codes from |
---|
585 | .Fn regcomp |
---|
586 | and |
---|
587 | .Fn regexec |
---|
588 | include the following: |
---|
589 | .Pp |
---|
590 | .Bl -tag -width REG_ECOLLATE -compact |
---|
591 | .It Dv REG_NOMATCH |
---|
592 | .Fn regexec |
---|
593 | failed to match |
---|
594 | .It Dv REG_BADPAT |
---|
595 | invalid regular expression |
---|
596 | .It Dv REG_ECOLLATE |
---|
597 | invalid collating element |
---|
598 | .It Dv REG_ECTYPE |
---|
599 | invalid character class |
---|
600 | .It Dv REG_EESCAPE |
---|
601 | .Ql \e |
---|
602 | applied to unescapable character |
---|
603 | .It Dv REG_ESUBREG |
---|
604 | invalid backreference number |
---|
605 | .It Dv REG_EBRACK |
---|
606 | brackets |
---|
607 | .Ql "[ ]" |
---|
608 | not balanced |
---|
609 | .It Dv REG_EPAREN |
---|
610 | parentheses |
---|
611 | .Ql "( )" |
---|
612 | not balanced |
---|
613 | .It Dv REG_EBRACE |
---|
614 | braces |
---|
615 | .Ql "{ }" |
---|
616 | not balanced |
---|
617 | .It Dv REG_BADBR |
---|
618 | invalid repetition count(s) in |
---|
619 | .Ql "{ }" |
---|
620 | .It Dv REG_ERANGE |
---|
621 | invalid character range in |
---|
622 | .Ql "[ ]" |
---|
623 | .It Dv REG_ESPACE |
---|
624 | ran out of memory |
---|
625 | .It Dv REG_BADRPT |
---|
626 | .Ql ?\& , |
---|
627 | .Ql *\& , |
---|
628 | or |
---|
629 | .Ql +\& |
---|
630 | operand invalid |
---|
631 | .It Dv REG_EMPTY |
---|
632 | empty (sub)expression |
---|
633 | .It Dv REG_ASSERT |
---|
634 | can't happen - you found a bug |
---|
635 | .It Dv REG_INVARG |
---|
636 | invalid argument, e.g. negative-length string |
---|
637 | .El |
---|
638 | .Sh HISTORY |
---|
639 | Originally written by |
---|
640 | .An Henry Spencer . |
---|
641 | Altered for inclusion in the |
---|
642 | .Bx 4.4 |
---|
643 | distribution. |
---|
644 | .Sh BUGS |
---|
645 | This is an alpha release with known defects. |
---|
646 | Please report problems. |
---|
647 | .Pp |
---|
648 | The back-reference code is subtle and doubts linger about its correctness |
---|
649 | in complex cases. |
---|
650 | .Pp |
---|
651 | .Fn Regexec |
---|
652 | performance is poor. |
---|
653 | This will improve with later releases. |
---|
654 | .Fa Nmatch |
---|
655 | exceeding 0 is expensive; |
---|
656 | .Fa nmatch |
---|
657 | exceeding 1 is worse. |
---|
658 | .Fn Regexec |
---|
659 | is largely insensitive to RE complexity |
---|
660 | .Em except |
---|
661 | that back |
---|
662 | references are massively expensive. |
---|
663 | RE length does matter; in particular, there is a strong speed bonus |
---|
664 | for keeping RE length under about 30 characters, |
---|
665 | with most special characters counting roughly double. |
---|
666 | .Pp |
---|
667 | .Fn Regcomp |
---|
668 | implements bounded repetitions by macro expansion, |
---|
669 | which is costly in time and space if counts are large |
---|
670 | or bounded repetitions are nested. |
---|
671 | An RE like, say, |
---|
672 | .Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}" |
---|
673 | will (eventually) run almost any existing machine out of swap space. |
---|
674 | .Pp |
---|
675 | There are suspected problems with response to obscure error conditions. |
---|
676 | Notably, |
---|
677 | certain kinds of internal overflow, |
---|
678 | produced only by truly enormous REs or by multiply nested bounded repetitions, |
---|
679 | are probably not handled well. |
---|
680 | .Pp |
---|
681 | Due to a mistake in |
---|
682 | .St -p1003.2 , |
---|
683 | things like |
---|
684 | .Ql "a)b" |
---|
685 | are legal REs because |
---|
686 | .Ql )\& |
---|
687 | is |
---|
688 | a special character only in the presence of a previous unmatched |
---|
689 | .Ql (\& . |
---|
690 | This can't be fixed until the spec is fixed. |
---|
691 | .Pp |
---|
692 | The standard's definition of back references is vague. |
---|
693 | For example, does |
---|
694 | .Ql "a\e(\e(b\e)*\e2\e)*d" |
---|
695 | match |
---|
696 | .Ql "abbbd" ? |
---|
697 | Until the standard is clarified, |
---|
698 | behavior in such cases should not be relied on. |
---|
699 | .Pp |
---|
700 | The implementation of word-boundary matching is a bit of a kludge, |
---|
701 | and bugs may lurk in combinations of word-boundary matching and anchoring. |
---|