[fpc-other] What makes a Compiler project (like FPC) special?

Discussion:

n***@z505.com

2017-05-26 04:30:29 UTC

This is directed at Florian primarily, but any other FPC core member
is welcome to chip in.
Since Florian mentioned that a compiler project is "rocket science"
[not his direct words, but he hinted at that] and totally different to
any other software project... It has really bugged me... Why is it
different, and What is different?

Sorry I don't speak as a core, but it is in fact rocket science..

If you look at the compiler sources and try to understand what is going
on, it's either:
- a mess
- rocket science
- both
Because there are so many things to take care of, probably more than a
space ship... after all, space ships that have their computers on them,
require a compiler to compile the code that runs on rockets/space ships.
So imo compilers are more important than space ships and rockets
themselves, but this, is speaking from an Alien perspective as I never
was from this Earth planet.
_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http:/

Florian Klämpfl

2017-05-25 20:20:04 UTC

Permalink

It is not complex in the sense Nikolay described.

I see no proof to
convince me otherwise
- a compiler is just a complex project.
Nothing "special" as he claimed it to be.

See, this is the reason why I do not believe you. You simply cannot understand the problems of a
compiler project which requires small linear steps which can be easily reviewed.

And it seems LLVM likes this small steps as well:
http://lists.llvm.org/pipermail/llvm-dev/2011-July/041781.html

_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://lists.freepas

Marco van de Voort

2017-05-25 21:52:08 UTC

Permalink

Yet the ?packages? and ?rtl? directories is just that - which by the way
is part of the FPC project.

Yes, except some of the parts directly connected to the compiler and its
features (like exceptions, RTTI etc)

And that is also where most commits have been going - based on the history
I queried for the last 4-6 months.

That overview is skewed by a high amount of work done on pas2js by Michael
and Mattias. It is atypical, and strictly speaking pas2js is ALSO a
compiler.

Nikolay and Karoly (+ rest of Amiga committers) have been persistently high
this cycle though.

_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-oth

Graeme Geldenhuys

2017-05-26 08:05:37 UTC

Permalink

We try to keep fpc layered
and everything, nevertheless, the unit dependency graph looks terrible, see attachment

Thanks for the information Florian. Just curious, what tool did you use
to generate that graph?

Regards,
Graeme

_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://lists.fr

Marco van de Voort

2017-05-25 21:04:17 UTC

Permalink

Just to be clear, I'm not pushing Git here - I know you guys will
not change - Florian made that very clear.

Yes, boundless leaps of faith are out of the question. Git should be a tool,
not a religion.

But Florian's statements just bugged me, and I see no proof to
convince me otherwise - a compiler is just a complex project.
Nothing "special" as he claimed it to be.

I do think Nikolay's point of it being more interconnected describes it
fairly well. There are no narrow interfaces that are natural seams for
modularization inside the compiler.

_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://list

Nikolay Nikolov

2017-05-25 21:21:46 UTC

Permalink

Post by Marco van de Voort
There are no narrow interfaces that are natural seams for
modularization inside the compiler.

Yet the “packages” and “rtl” directories is just that - which by the
way is part of the FPC project.

"packages" - yes. But "rtl" is a lot more tightly coupled to the
"compiler" than you think.

Nikolay
_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listin

n***@z505.com

2017-05-26 06:36:15 UTC

Permalink

Post by Nikolay Nikolov

Post by Marco van de Voort
There are no narrow interfaces that are natural seams for
modularization inside the compiler.

Yet the “packages” and “rtl” directories is just that - which by the
way is part of the FPC project.

"packages" - yes. But "rtl" is a lot more tightly coupled to the
"compiler" than you think.

Indeed, all the string routines and reference counting is tied to the
compiler.
I learned this when I had an idea for a new string type..

A string(1000) instead of string(255) or a string(arbitrary). Fixed
length string that can go beyond 255. Oberon has it. When you introduce
a new type into the RTL that is not a class, but a type that is part of
the language itself you have to add all the routines in sysutils to deal
with this type, change the compiler, make overloaded string routines.

And system.pp is tightly intertwined with the compiler and is almost a
run time for the language. Not as much like Plain C, but probably even
Plain C has some connections to the include files and the compiler

Then there is writeln too which is tied to the compiler as it is like a
varargs, but accepts multiple types. So if you introduce a new
string(1000) into the compiler, writeln also has to be modified to
accept this new type as a parameter. But not just writeln, other things
too.

If you don't have this intertwined and tightly integrated system you
just end up putting things into a Lisp like system where it is all
defined by not the compiler/rtl, but the libraries/modules which changes
the language at run time. Powerful but also double edged sword that
makes the language redefinable - open for abuse and misuse, such as
lisp's ability to basically rechange the entire language to any thing
you want as long as you include (some(brackets)) but everything else is
up for grabs.
_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/

n***@z505.com

2017-05-26 06:28:22 UTC

Permalink

Post by Marco van de Voort

But Florian's statements just bugged me, and I see no proof to
convince me otherwise - a compiler is just a complex project.
Nothing "special" as he claimed it to be.

I do think Nikolay's point of it being more interconnected describes it
fairly well. There are no narrow interfaces that are natural seams for
modularization inside the compiler.

I've always wanted to modularize the fpc compiler in such a way that you
could use the fpc compiler for not just compilation, but
checking/parsing the code for other reasons, i.e. fpdoc.
But this would likely make the compiler slower.

And another modular compiler idea is that you could embed a relational
database language inside fpc as a plugin, such as SQL or TutorialD, but
these are just pipe dreams - and a plugin/modular compiler likely will
be slower to compile code since there are now more things the compiler
has to choose to do, more wrappers. But, I don't know for certain.

There was also that strange, but rejected, object oriented plugin
compiler that someone was working on at one time.. Was it Dodi? I forget
his name. Maybe the delphi decompiler guy?
_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://li

Sven Barth via fpc-other

2017-05-26 10:53:51 UTC

Permalink

So what is Florian going on about regarding workflow and Git not being able
to cope in a "compiler" based project? He made it out as if FPC will not be
workable in a Git managed environment. I don't see his analogy. The Linux
Kernel running on more platforms than FPC does, is just as complex a beast,
if not more - considering that the Linux Kernel probably has 10's of
millions of lines of code, 2000+ contributors. The same could be said for
the KDE and Qt framework. The latter runs on just about every platform out
there, as multiple rendering engines, font engines, theme engines, layout
engines etc.

The workflow of the Linux kernel would simply not be acceptable for us
with them having this hierarchy of maintainers with Linus at the top
and doing all the final merging. If at all an approach like Qt uses
with automatic continous integration of pull requests committed by
core devs or forwarded by them from non-core devs would be the way to
go for us - if at all.

Regards,
Sven
_______________________________________________
fpc-other maillist - fpc-***@lists.freepascal.org
http://lists.freepa

Paul Robinson

2017-05-27 04:59:12 UTC

Permalink

Graeme Geldenhuys asked in Vol 108, Issue 27, "What makes a compiler project special?"
Well, I'm not a member of the FPC but I've worked on several compilers and I'll throw in my 0.02 Euro into the discussion.

Since Florian mentioned that a compiler project is "rocket science" [not his
direct words, but he hinted at that] and totally different to any other software
project... It has really bugged me... Why is it different, and What is different?

I'm going to have to disagree here, and it may simply display my own ignorance of the subject, but, then again, even a stopped clock is right twice a day.
A compiler is a "language processor," an application that converts code in one language into something else. If it's a translating compiler it converts it to another language. If it's a language compiler it converts it to binary code or potentially to assembly language. (I'm making a bit of a distinction in that a compiler that translates to assembly code isn't a "translator" because it is using the assembler to save some of the work in not "reinventing the wheel" and not having to create its own object file writer, and because compilers generating assembly are usually creating a finished output requiring no manual intervention. Most translators that change source from one (high level) language to another produce results that often require manual correction. Few translators produce "perfect" high-level to high-level conversions without some work. They'll do the "heavy lifting" but often minor "tweaks" or checking is required by the person.)

At its core, a language processor is a text processing application. It takes a fixed combination of rules on what the programmer can and must "say" in order to specify the particular actions they want a program to accomplish. Given these rules, which are called "grammars" the programmer describes the program and the compiler takes that description and turns it into the target representation of that description.
In the case of a translator, it produces a new program in a different language. Or it may be the same language but converted to a different dialect, such as a translation from a different Pascal compiler, or a conversion from HP Cobol to IBM Mainframe Cobol, or conversion from C or Fortran to something newer.
Most language processors have gone to using parser generators in order to reduce the work involved in scanning a source language. Some may simply do language scanning directly. Most older Pascal compilers used "symbol substitution" in which as the language was scanned, it would create a symbol identifying what had been found. Whether it was an unrecognized word (which would indicate a user identifier), a symbol (like :, >, /, comma, etc) or a keyword (USES, UNIT, BEGIN, etc). Then the internal "current symbol" was set to the value of that symbol.

Most compilers had about a 1 byte lookahead so that it could determine if it was a single byte symbol (comma, ^, or ' ) or a multibyte symbol which may be different depending on the second byte (> followed by an identifier vs. >=, : vs :=, < vs <> or <=). Okay, all of this was reasonable until object orientation came into use.
When one uses a variable, or a constant, which one are you using? Well, it depends on the "scope." If you have I defined in the main program (or in the definitions of a UNIT), your reference is to that one. If you're inside a procedure, function, method or other similar construct and you define I there, it uses that one/ But what if your program - or UNIT - calls several others UNITs each having a variable I defined, which one does it use? The first one? The last one?
Now, the plot thickens if you reference an object. An identifier in that object can be fixed or virtual. In which case, it may not be certain until execution time which one is being used, a variable or procedure in the base class or an overridden one in a descendant object. So a compiler has to read the tables in a unit in order to discover what items are visible and where they are in that unit, also to know what kind of variable (or procedure, or function) it is, and what is legal to do with it (can't add a 64-bit integer to an 8-bit unsigned byte because they're not compatible but you can do it the other way around.)

But this is still the translation of symbols and assigning them attributes including whether they are a standalone item (like a unit), a dependent item (like a variable in a program) or an internal item (like a field in a record or a member of an object.) It requires you keep information about these things but I don't think this is any worse than the work involved in a video game in holding state information about the game map, the player character (PC), non-player characters (NPCs), enemies, objects the player can hold (guns, Portal Device, radio) or the use or consume (money, ammo, health).
The last time I did a compile of the full compiler, it was on a reasonable machine maybe a year or two ago, was about 262,000 lines probably not including run-time libraries, and took an amazingly fast 13 seconds. In the end, it's still a text processor which attempts to take the explanation of what the programmer thinks the program is to do and translates it into a means to execute that explanation.
Even so, I'm sure it does not rise to the level of complexity of other types of applications involving other fields even if those programs are smaller in size. I suspect chemical analysis or actual programs involving real "rocket science" are considerably more complicated.
Let's put it at the level of a word processor, which might have to do a lot of similar things, such as process a document and redline the misspelled words, or even "compile" the formatted document into a PDF. But maybe that's too different a comparison as word processors do other things to documents. However I am trying to explain why a compiler application, while having some complexity, really isn't all that different from a typical "ordinary" application such as a word processor or other application most people deal with every day.
And is probably a lot less complex, too.
Â
Paul
Paul Robinson <***@paul-robinson.us> - http://paul-robinson.us (My blog)
"The lessons of history teach us - if they teach us anything - that no one learns the lessons that history teaches us."

Nikolay Nikolov

2017-05-27 18:51:57 UTC

Permalink

Post by Paul Robinson
Graeme Geldenhuys asked in Vol 108, Issue 27, "What makes a compiler project special?"
Well, I'm not a member of the FPC but I've worked on several compilers
and I'll throw in my 0.02 Euro into the discussion.

Since Florian mentioned that a compiler project is "rocket science"

[not his

direct words, but he hinted at that] and totally different to any

other software

project... It has really bugged me... Why is it different, and What

is different?
I'm going to have to disagree here, and it may simply display my own
ignorance of the subject, but, then again, even a stopped clock is
right twice a day.
A compiler is a "language processor," an application that converts
code in one language into something else. If it's a translating
compiler it converts it to another language. If it's a language
compiler it converts it to binary code or potentially to assembly
language. (I'm making a bit of a distinction in that a compiler that
translates to assembly code isn't a "translator" because it is using
the assembler to save some of the work in not "reinventing the wheel"
and not having to create its own object file writer, and because
compilers generating assembly are usually creating a finished output
requiring no manual intervention. Most translators that change source
from one (high level) language to another produce results that often
require manual correction. Few translators produce "perfect"
high-level to high-level conversions without some work. They'll do the
"heavy lifting" but often minor "tweaks" or checking is required by
the person.)
At its core, a language processor is a text processing application. It
takes a fixed combination of rules on what the programmer can and must
"say" in order to specify the particular actions they want a program
to accomplish. Given these rules, which are called "grammars" the
programmer describes the program and the compiler takes that
description and turns it into the target representation of that
description.

Well, a mathematician is also a text processor, therefore mathematics is
simple, right? :)

Nikolay