Hi All,

Are there any APIs that can build regular expressions?

Regards, GJ
Posted on 2003-05-13 11:19:32 by Green Joe
You might give a try to GNU regex source, if you translate the C header file you can use it with your assembler program.

the source is
http://ftp.gnu.org/pub/gnu/regex/regex-0.12.tar.gz
Posted on 2003-05-13 12:57:02 by _Servil_
Are there any APIs that can build regular expressions?
No. But there will be libraries out there for that, but not necessarily free.
Posted on 2003-05-13 17:10:01 by sluggy
Posted on 2003-05-13 19:47:06 by iblis
neat, thx iblis :)
Posted on 2003-05-13 20:01:47 by Hiroshimator
I had a look at the gnu source I mentioned above and after some adjustments it should work fine with MASM, I hope the compiled library won't violate any rules as it's source is freeware.
Posted on 2003-05-14 11:36:16 by _Servil_
Thanx a lot to everybody,

Actually GNUWin32 site lists as many as 4 different regular expression libraries to download.


    [*]PCRE (Perl Compatible Regular Expressins)
    [*]RegEx-GNU
    [*]RegEx-Spencer
    [*]Rx


    A lot of work ahead to figure out which library for which purpose is the best. It should be really nice to get some short insights from people who already used one of this libraries in their applications.

    Regards, GJ
Posted on 2003-05-14 12:30:06 by Green Joe
I've only used PCRE thus far.

It's very easy to use, and the API is small so translating the .h to .inc shouldn't be very time consuming. It also lets you specify your own malloc() and free() routines for it to call.

I never tested for speed, but the project I used it in had no problems speedwise. Parsing 20mb worth of HTML documents _multiple times took about 2 minutes. And that's including all of the other non-regex related operations performed on it.
Posted on 2003-05-14 14:40:41 by iblis
Here's an article on building a custom RegEx parser:

http://www.codeguru.com/cpp_mfc/RegEx.html


Thanks,
_Shawn
Posted on 2003-11-18 21:23:59 by _Shawn
The HLA Standard Library (http://webster.cs.ucr.edu) contains a "pattern matching library" module that lets you do generalized pattern matching in assembly language (context-free patterns as well as regular expressions). Alas, this is *one* of the few modules that doesn't translate to MASM real well because it takes advantage of HLA's powerful macro facilities. Nevertheless, you might want to take a look at it. I've posted a "regular expression" example from the "Great Computer Language Shootout" to this end of this post (note, btw, that this post includes the original C code as comments, followed by the HLA code). The cool thing here is that the assembly version is actually *shorter* (in terms of lines of code) than the C version.
Cheers,
Randy Hyde



// regexpGCLS
//
// This program demonstrates processing of regular expressions in
// assembly language. This is based on the "regexp.gcc"
// program that is part of "The Great Computer
// Language Shoot-out" found at
//
// [url]http://www.bagley.org/~doug/shootout/[/url]
//
// The purpose of that web page is to demonstrate several
// applications written in various languages. Although one
// of the primary purposes of that web site is to demonstrate
// the different run-time efficiencies of various languages,
// this HLA implementation was not created to demonstrate
// that assembly code is faster or smaller (everyone pretty
// much accepts the fact that the fastest and/or smallest
// example of a program will be written in assembly language).
// Instead, this example demonstrates that with the use of
// a high level assembler (e.g., HLA), it's also possible to
// write code almost as easily as in a high level language
// like C. As such, this example freely sacrifices efficiency
// for readability.

#if( false )

/* -*- mode: c -*-
* $Id: regexmatch.gcc,v 1.4 2000/12/24 05:43:53 doug Exp $
* [url]http://www.bagley.org/~doug/shootout/[/url]
*/

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <pcre.h>
#include <string.h>

#define MAXLINES 100
#define MAXLINELEN 132

char *pattern =
"(?:^|[^\\d\\(])" /* must be preceeded by non-digit */
"(\\()?" /* match 1: possible initial left paren */
"(\\d\\d\\d)" /* match 2: area code is 3 digits */
"(?(1)\\))" /* if match1 then match right paren */
"[ ]" /* area code followed by one space */
"(\\d\\d\\d)" /* match 3: prefix of 3 digits */
"[ -]" /* separator is either space or dash */
"(\\d\\d\\d\\d)" /* match 4: last 4 digits */
"\\D" /* must be followed by a non-digit */
;


int
main(int argc, char *argv[]) {
int NUM = ((argc == 2) ? atoi(argv[1]) : 1);
int count;
char *cptr = "";
char **phones;
pcre *re;
int erroffset;
const char *errptr;
int n, lines = 0;
char num[256];
int i, j, k, matchlen;
char *matchoffset;
int nmatches;
int *ovec, ovecsize;
pcre_extra *study;

phones = (char **)malloc(MAXLINES * sizeof(char *));
if (!phones) {
fprintf(stderr, "malloc for phones array failed\n");
exit(1);
}
lines = 0;
while (cptr) {
phones[lines] = (char *)malloc(MAXLINELEN);
if (!phones[lines]) {
fprintf(stderr, "malloc to hold line #%d failed\n", lines);
exit(1);
}
cptr = fgets(phones[lines], MAXLINELEN, stdin);
lines++;
if (lines > MAXLINES) {
fprintf(stderr, "MAXLINES is too small\n");
exit(1);
}
}

re = pcre_compile(pattern, 0, &errptr, &erroffset, NULL);
if (!re) {
fprintf(stderr, "can't open compile regexp\n");
exit(1);
}

study = pcre_study(re, 0, &errptr);

if (pcre_fullinfo(re, NULL, PCRE_INFO_CAPTURECOUNT, &nmatches) != 0) {
fprintf(stderr, "pcre_fullinfo failed\n");
exit(1);
}
nmatches++; /* add match of entire pattern */

ovecsize = sizeof(int) * nmatches * 3;
ovec = (int *)malloc(ovecsize);
if (!ovec) {
fprintf(stderr, "malloc for ovec array failed\n");
exit(1);
}

count = 0;
while (NUM--) {
for (i=0; i<lines; i++) {
n = pcre_exec(re, study,
phones[i], strlen(phones[i]), 0,
0, ovec, ovecsize);
if (n == nmatches) {
/* stuff the match into the buffer "num" */
k = 2*2; /* initial offset into ovec */
/* areacode */
j = 0;
num[j++] = '(';
matchoffset = phones[i] + ovec[k];
matchlen = ovec[k+1] - ovec[k];
strncpy(num+j, matchoffset, matchlen);
j += matchlen; k += 2;
num[j++] = ')';
/* space separator */
num[j++] = ' ';
/* exchange */
matchoffset = phones[i] + ovec[k];
matchlen = ovec[k+1] - ovec[k];
strncpy(num+j, matchoffset, matchlen);
j += matchlen; k += 2;
/* dash */
num[j++] = '-';
/* last 4 digits */
matchoffset = phones[i] + ovec[k];
matchlen = ovec[k+1] - ovec[k];
strncpy(num+j, matchoffset, matchlen);
j += matchlen; k += 2;
/* with a cherry on top */
num[j] = 0;
if (0 == NUM) {
count++;
printf("%d: %s\n", count, num);
}
}
}
}

for (i=0; i<MAXLINES; i++) {
free(phones[i]);
}
free(phones);
free(ovec);

return(0);
}
#endif


program regexp;
#include( "stdlib.hhf" )
const
MaxLines := 100;

static
f :dword;
i :uns32;
filename :string;
lineCnt :uns32;
areaCode :str.strvar(16);
prefix :str.strvar(16);
suffix :str.strvar(16);
lines :string[ MaxLines ];

begin regexp;

if( arg.c() != 2 ) then

stdout.put( "Usage: regexp <filename>" nl );
exit regexp;

endif;
mov( fileio.open( arg.v( 1 ), fileio.r ), f );
mov( 0, ebx );
while( !fileio.eof( f )) do

fileio.a_gets( f );
mov( eax, lines[ ebx*4 ] );
inc( ebx );

endwhile;
mov( ebx, lineCnt );
fileio.close( f );
for( mov( 0, i ); mov( i, edx ) < lineCnt; inc( i )) do

pat.match( lines[ edx*4 ] );

pat.zeroOrMoreCset( -{ '(','0'..'9' } );
pat.zeroOrOneChar( '(' );
pat.exactlyNCset( {'0'..'9'}, 3 );
pat.extract( areaCode );
pat.zeroOrOneChar( ')' );
pat.zeroOrMoreWS();
pat.exactlyNCset( {'0'..'9'}, 3 );
pat.extract( prefix );
pat.oneOrMoreCset( {'-', ' '} );
pat.exactlyNCset( {'0'..'9'}, 4 );
pat.extract( suffix );

stdout.put( i:2,": (", areaCode, ") ", prefix, '-', suffix, nl );

pat.if_failure;

pat.endmatch;

endfor;

end regexp;
Posted on 2003-11-18 22:38:12 by rhyde

The cool thing here is that the assembly version is actually *shorter* (in terms of lines of code) than the C version.


HLA is a nice tool and I empathize with your zeal for promoting your brainchild. However, comparing it for source size on a one to one basis with C, especially in this particular instance, is a little unfair I feel.

The C version is larger for several reasons, for starters it rightly uses up its line count for things like error checking and comments, elements which seem to be missing from the HLA source. You also get to see all the gory details of initializing and setting up the PCRE structures, using the returned data pointers to extract the substrings etc, whereas the gory details of HLA's regexp initialization and such seem to be almost completely hidden. The level of abstraction is deeper certainly, but do realize that the same level of abstraction and simplification is also possible in C/C++ (especially C++), even if this particular author of the C source opted not to make us of it.

But at the risk of damning this thread to the Crusades forum, I will stop there. ;)
Posted on 2003-11-19 04:44:01 by iblis



HLA is a nice tool and I empathize with your zeal for promoting your brainchild. However, comparing it for source size on a one to one basis with C, especially in this particular instance, is a little unfair I feel.


Why?
C programmers get to take advantage of C features to reduce line counts, why cannot assembly programmers take advantage of their tools to do the same thing?


The C version is larger for several reasons, for starters it rightly uses up its line count for things like error checking and comments, elements which seem to be missing from the HLA source.


HLA uses exceptions to report errors. There is not need, for example, to check memory allocation failures. HLA will raise an exception and report that error should it occur. Ditto for conversion errors. Go ahead and subtract the comments from the C code, it still has more lines of code.



You also get to see all the gory details of initializing and setting up the PCRE structures, using the returned data pointers to extract the substrings etc, whereas the gory details of HLA's regexp initialization and such seem to be almost completely hidden.

What HLA regexp initialization? Granted, you probably don't know much about HLA pattern matching library, but there is no extraneous initialization going on. As for extracting the substrings, that's exactly what statements like:


pat.extract( areaCode );

are doing. All you're really complaining about is that it *does* take a small amount of study to understand HLA's pattern matching facilities. That's going to be true of any library routine you call (regular expression or otherwise).


The level of abstraction is deeper certainly, but do realize that the same level of abstraction and simplification is also possible in C/C++ (especially C++), even if this particular author of the C source opted not to make us of it.

Actually, having written the HLA Pattern Matching Library module, I can pretty much assure you that it would be quite difficult to do the same thing in C with anywhere near the performance level. One that that this example does not demonstrate is that HLA's pattern matching facilities fully support back-tracking when processing patterns (like the SNOBOL4 and Icon programming languages). Supporting *that* efficiently in C is quite a bit of work; in HLA it's automatic.


But at the risk of damning this thread to the Crusades forum, I will stop there. ;)


Yes, the OP asked for regexp library routines for assembly code. I provided the resource. An interesting artifact of the example is that it allows assembly programmers to write code that is shorter than a similar C program. Any further comments on that do belong in the colleseum and I would ask the moderator to move these two posts there if any additional replies appear.
Cheers,
Randy Hyde
Posted on 2003-11-19 10:14:50 by rhyde
My point is that the source line count or the amount of typing that must be done by the user is not necessarily a result of the language used, but relies heavily on how much work is already done for said user.

I could pull all of that C code out of main() and put it in a library with a single exported function call called "DoRandallsRegexpThingy()" and then the user would only be required to write one line in main(). That is not to say that it would in any way be good programming practice to do so, but it illustrates my point. How fair would it be to compare that 1 line (excluding the necessary lines of course) program to the HLA program you posted? Not very. Those programs, both the C and the HLA examples, may produce the same output, but they use completely different methods and different levels of abstraction to get the job done. This is why you cannot justify the comparison you made.

A true "line count contest" as it were, between C and HLA would be to write this program, or any other program for that matter, without using any library calls except perhaps, those procedures which may be needed to communicate with the operating system to perform very basic IO operations.
Posted on 2003-11-19 14:58:23 by iblis
Originally posted by iblis
My point is that the source line count or the amount of typing that must be done by the user is not necessarily a result of the language used, but relies heavily on how much work is already done for said user.

Yes, that is an advantage of using HLA for pattern matching (e.g., regular expressions). Much has already been done for the user. And as it's a part of the HLA Standard Library, it's available for all (HLA) users any time they want it.


I could pull all of that C code out of main() and put it in a library with a single exported function call called "DoRandallsRegexpThingy()" and then the user would only be required to write one line in main().


If it were part of the C standard library, I'd buy that argument. But it's not. So it's not generally available to C programmers. Nor can C programmers call the HLA Standard Library pattern
matching routines because they mess with the stack like you wouldn't believe. The fact that these routines *do* appear in a standard assembly language library means that it is fair for assembly programmers to call them.

The bottom line is "how much effort does it take to solve this problem in a given language".
Obviously, if the solution already exists (i.e., pulling the code out of main and turning it into a function call) then there is trivial effort involved. OTOH, such a trivial solution also has a limited domain of applicability (who else would be able to use that library routine you've created?). The HLA solution is general. I can match regular expressions, context-free grammars, and all sorts of patterns using the pattern matching library.

Line counts *don't* really matter. What matters is effort involved. And there's clearly less effort involved in writing the HLA code than in writing the C code for this particular application. And that's true even though the C code calls a non-standard library routine (i.e., you go out and find a regexp library on the net somewhere and decide to use that).



That is not to say that it would in any way be good programming practice to do so, but it illustrates my point. How fair would it be to compare that 1 line (excluding the necessary lines of course) program to the HLA program you posted? Not very. Those programs, both the C and the HLA examples, may produce the same output, but they use completely different methods and different levels of abstraction to get the job done. This is why you cannot justify the comparison you made.

While we're arguing on the side of "not fair", how about that regular expression library that the C code is *already* calling that's not part of the C standard library? Did you miss that? Already they've done what you claim can be done and it's still more work than the corresponding assembly code.


A true "line count contest" as it were, between C and HLA would be to write this program, or any other program for that matter, without using any library calls except perhaps, those procedures which may be needed to communicate with the operating system to perform very basic IO operations.


What makes that a true line count test? That's obviously skewed in favor of C as C supports arithmetic expressions and assembly does not. If you're going to eliminate the things that give HLA an advantage in certain application areas, it's quite clear that C will always win.

The bottom line is that C generally requires less effort than assembly (HLA or otherwise). But sometimes, assembly wins. The amusing thing about this discourse is that the argument is usually the other way around (people arguing that counting lines does not prove assembly is worse than C). Pattern matching is not an area where C/C++ is a particularly good fit. It's not surprising that other languages (including assembly) can beat it at this task. OTOH, take a language like SNOBOL4 and it'll blow both HLA and C right out of the water (in terms of line count). The cool thing about assembly is that with the appropriate library routines and macros, assembly can be coerced to behave well in just about *any* problem domain.

And, once again, I must point out that the OP asked for an assembly regexp parser. I provided one. They didn't ask for C code. Though the HLA Standard Library Pattern matching routines are not a "regular expression interpreter", it's very easy to specify any pattern using calls to these routines.
And if the programmer *really* wants to use a different assembler than HLA, they can always take your advice and code that pattern matching code in a simple HLA procedure and link that code into their other assembler (although you can call the HLA Pattern matching routines from any assembler, pushing the appropriate parameters for many of the routines is a bit of a bear because most assemblers don't provide a character set data type and other facilities that HLA has built-in).

Cheers,
Randy Hyde
Posted on 2003-11-19 15:28:38 by rhyde
Originally posted by randall.hyde
If it were part of the C standard library, I'd buy that argument. But it's not.

Well then the whole argument is invalid because PCRE is not a standard library. But, a library, standard or otherwise, does not a language make.

Originally posted by randall.hyde
So it's not generally available to C programmers. Nor can C programmers call the HLA Standard Library pattern
matching routines because they mess with the stack like you wouldn't believe.

I'm sure that I or most anybody could write a C library to wrap the HLA pattern matching routines. As long as the calling stack is in the same state, and not overwritten, when the routine returns, then it would be a fairly easy task.

Originally posted by randall.hyde
The fact that these routines *do* appear in a standard assembly language library means that it is fair for assembly programmers to call them.

Standardized by what committee? I don't recall Microsoft releasing any such libraries with MASM, nor Borland with TASM. You might have deemed it an HLA Standard Library but let's please not call it a "standard assembly language library."
Does it matter if it's standard anyway? Having a standard library handy might save you a little time searching for the appropriate 3rd party library but, standard or otherwise - it all gets the job done.

Originally posted by randall.hyde
The HLA solution is general. I can match regular expressions, context-free grammars, and all sorts of patterns using the pattern matching library.

Right, and I can use any number of pattern matching libraries to do the same thing. With the right string class library, for example, pattern matching could be as easy as "string a = b.match(mypattern);" Of course that's C++ and not C, but an only slightly more obfuscated C version could be thrown together.

Originally posted by randall.hyde
Line counts *don't* really matter. What matters is effort involved. And there's clearly less effort involved in writing the HLA code than in writing the C code for this particular application.

Are we not taking into account the proficiency of the individual programmer? Certainly that also has a lot to do with the effort involved. What would you have to say about the proficiency of the author of that C code?
And at the risk of sounding repetitive, I will reiterate that the effort involved relies heavily on the amount of effort that has already been done for the programmer. I'm sure you spent many long hard hours putting together HLA's pattern matching library, wrapping it up all nice and tight so that it would be very easy to use and require almost no effort for the user. The authors of PCRE might not have gone quite that far but that doesn't mean it's not possible to do.

Originally posted by randall.hyde
While we're arguing on the side of "not fair", how about that regular expression library that the C code is *already* calling that's not part of the C standard library? Did you miss that? Already they've done what you claim can be done and it's still more work than the corresponding assembly code.

See above replies.

Originally posted by randall.hyde
What makes that a true line count test? That's obviously skewed in favor of C as C supports arithmetic expressions and assembly does not.

How is that skewed in favor of C? Without anything but the most basic IO calls, each language will have a perfectly "clean slate" with which to start. If you want to compare libraries, then do so. But if you're going to compare two languages, then compare the languages, not the libraries.

Originally posted by randall.hyde
If you're going to eliminate the things that give HLA an advantage in certain application areas, it's quite clear that C will always win.

I'm not here to slam or bash or even belittle HLA. I don't know enough about it to do so, even if that was my intent. I only wanted you to recognize the error in making general comparisons about the effort involved in HLA and C.
Posted on 2003-11-19 16:38:35 by iblis
randall,

HLA uses exceptions to report errors.

Do you mean use of SEH? Can you report the exact error back to the calling routine and let it handle it, or do you just do a generic 'error' kind of thing?
Posted on 2003-11-19 17:07:28 by f0dder

randall,

Do you mean use of SEH? Can you report the exact error back to the calling routine and let it handle it, or do you just do a generic 'error' kind of thing?


Yes, SEH.
And yes, the calling routine can handle the exception or it can just pass it through.
In the case of simply aborting the program (as the original C code did), it's easiest just to let the run-time system deal with the exception. OTOH, if you want to deal with in yourself, that's quite easy, e.g.,



try
<<protected statements>>

exception( exceptionID )
<< code to handle specified exception >>

exception( anotherExceptionID )
<< more code >>
etc.

endtry;


Cheers,
Randy Hyde
Posted on 2003-11-19 17:56:19 by rhyde
hm, seems a bit inflexible to me? Or am I wrong when I assume you handle memalloc-returning-null by trapping the resulting pagefault when null memory is dereferenced? Unless I'm missing something, it seems like it will be somewhat hard for the application to figure out exactly what the error is - after all, pagefaults can be caused by a number of things :)
Posted on 2003-11-19 18:02:13 by f0dder
Originally posted by f0dder
hm, seems a bit inflexible to me? Or am I wrong when I assume you handle memalloc-returning-null by trapping the resulting pagefault when null memory is dereferenced? Unless I'm missing something, it seems like it will be somewhat hard for the application to figure out exactly what the error is - after all, pagefaults can be caused by a number of things :)


When the malloc function cannot allocate the requested storage, it raises an exception. It does not return NULL or any other value like that. As such exceptions rarely occur, it's far more efficient to raise an exception in those rare cases when an allocation failure does occur versus forcing the caller to test a return result on every call.

HLA, of course, supports all the normal hardware/OS exceptions, but it also gives applications the ability to raise their own exceptions. The malloc routine, for example, raises an ex.memoryAllocationFailure exception when it cannot allocate storage for a request. Of course, if you don't put any try..endtry statements in your code, then *any* malloc call can raise this exception and you won't have a clue who caused it. OTOH, you can place a try..endtry call around each and every malloc call (if you so choose) to narrow the location down. Then again, a better solution is just to run the code through OllyDbg when you're getting the failure and you can pinpoint the source of the problem real quick. Exceptions are better used for taking corrective action in the code rather than as an expensive form of an assert statement.
Cheers,
Randy Hyde
Posted on 2003-11-19 22:09:04 by rhyde
Originally posted by iblis
Well then the whole argument is invalid because PCRE is not a standard library. But, a library, standard or otherwise, does not a language make.

Though I agree with the sentiment of your statement, the modern view is that the *standard* library is a part of the language. Regardless of what academic language designers preach, however, the HLA Standard Library is an integral part of the HLA *system* (if you don't like the word *language*) and it's very rare to see any HLA code written that doesn't use the HLA stdlib (just like it's rare to find any C code that doesn't make C stdlib calls). In that sense, comparing the capabilities of the languages with their standard libraries is not too outrageous. Definitely not much beyond that, though. Otherwise we run into your complaint about burying the entire application into a "library" somewhere.


I'm sure that I or most anybody could write a C library to wrap the HLA pattern matching routines. As long as the calling stack is in the same state, and not overwritten, when the routine returns, then it would be a fairly easy task.


You could, conceivably, write an assembly functions that makes all the pattern matching calls to parse a particular pattern and return match/no match back to C, but it is not possible to call the pattern matching routines directly from C. In order to support backtracking, the individual routines leave lots of stuff on the stack and this would mess up the C compiler considerably.



Standardized by what committee? I don't recall Microsoft releasing any such libraries with MASM, nor Borland with TASM. You might have deemed it an HLA Standard Library but let's please not call it a "standard assembly language library."

The "UCR Standard Library for 80x86 Assembly Programmers" was around long enough (and pushed by the 16-bit edition of AoA) that you could consider it a "standard assembly library." It was used by thousands of assembly programmers over the years. No, not an official standard, but the closest thing to a defacto standard that you're going to find.

And as the creator of the HLA language, I have every right to define what belongs in the HLA Standard Library. Just like Microsoft could define what belongs in a MASM standard library or Borland could have defined what was in a TASM standard library.

Now that HLA's Standard Library is available for MASM and FASM (probably with NASM coming along before too much longer), you're going to see a few more people using it. The fact that AoA/32-bits pushes the HLA Standard Library means that, in time, it's going to be just as accepted and used as the UCR stdlib package.


Does it matter if it's standard anyway? Having a standard library handy might save you a little time searching for the appropriate 3rd party library but, standard or otherwise - it all gets the job done.

In the case of comparing langauges, of course it matters. Otherwise, the arguments always degenerate to "well, I've published that application as a library module so it only takes one line of code to do it in language XYZ."
It is fair to include standard library modules defined for a language as you can reasonably expect those routines to be available wherever there is an implementation of that language. The same is not true for 3rd party libraries.



Right, and I can use any number of pattern matching libraries to do the same thing. With the right string class library, for example, pattern matching could be as easy as "string a = b.match(mypattern);" Of course that's C++ and not C, but an only slightly more obfuscated C version could be thrown together.


But that's the "I could write that library module..." attitude. If it doesn't already exist and you have to write it, you have to consider the cost of writing that module as part of the project's development effort (and, you have to tack on those lines of code in your KLOC count). The difference with HLA is that those routines are already written, documented, and tested. You can call them as-is.


Are we not taking into account the proficiency of the individual programmer? Certainly that also has a lot to do with the effort involved. What would you have to say about the proficiency of the author of that C code?
And at the risk of sounding repetitive, I will reiterate that the effort involved relies heavily on the amount of effort that has already been done for the programmer. I'm sure you spent many long hard hours putting together HLA's pattern matching library, wrapping it up all nice and tight so that it would be very easy to use and require almost no effort for the user. The authors of PCRE might not have gone quite that far but that doesn't mean it's not possible to do.


No, it's quite possible to do the same things as the HLA pattern matching code in C (though probably not as efficiently). OTOH, the fact that it's not in the C/C++ standard library means that such code will not be known to most people who might want to use it. Rather than bother searching for a decent library routine to do this stuff (then figure out if it satisfies their needs), they just use Perl, Awk, or some similar string based language :-)



How is that skewed in favor of C? Without anything but the most basic IO calls, each language will have a perfectly "clean slate" with which to start. If you want to compare libraries, then do so. But if you're going to compare two languages, then compare the languages, not the libraries.


Easy, HLA includes lots of library routines to handle things that are built into languages like C. For example, consider accessing elements of multi-dimensional arrays. Trivial in a language like C; hairy in "pure" assembly language. OTOH, if you use the HLA array.index macro, accessing elements of an n-dimensional static or dynamically allocated array is a piece of cake. Is it fair to force the assembly user to manually emit this code by hand while the C programmer can simply write "A"? Again, as most HLA users make heavy use of the HLA Standard Library, I argue that it's prefectly fair to consider invocations of the array.index macro as this macro is available in every implementation of HLA.


I'm not here to slam or bash or even belittle HLA. I don't know enough about it to do so, even if that was my intent. I only wanted you to recognize the error in making general comparisons about the effort involved in HLA and C.


Who was making general comparisons? I was simply pointing out that in this one special case (pattern matching), it takes fewer lines of code to implement the solution in HLA than in C. Clearly, the reverse situation is the more common case.

Here are some additional areas where HLA does a better job than C/C++:

iterators: HLA has true iterators and a foreach loop. Those things that C++ programmers like to call iterators aren't even close. I can write a program containing a "foreach" loop that traverses a binary tree using pre-order, in-order, or post-order algorithms with fewer lines of code than I've ever seen in C or C++.

delayed parameter evaluation: not even possible in C/C++. HLA fully supports call by name and call by evaluation parameters. It also supports thunks. Very important for many AI types of projects.

value/reference parameters: HLA fully supports value and reference parameters of *all* types. C/C++ does not.

macro processing: hey, the C/C++ preprocessor is the most pitiful example of a macro processor that could be written and still be called a macro processor. It is no wonder that C programmers tend to shy away from using macros when they begin learning assembly language - they've been heavily prejudiced by the problems with the C preprocessor (it *can* be done right in a HLL, check out Dylan sometime).

And, of course, all the normal things where assembly excels over C (such as complete control over the order of evaluation of arithmetic expressions, multiprecision operations, etc.).
Cheers,
Randy Hyde
Posted on 2003-11-19 22:36:50 by rhyde