this is a very good blog post՝ back to basics about strings in C and Pascal. i suggest you to read it, and now i’ll tell you something else:
in Oberon, Wirth decided to give up on Pascal strings, and use zero terminated strings.
however, there is no need to run by the whole array to find out the length of the string. we don’t even need to have a separate field that holds the length՝ compiler knows the length of the static string.
thus it can do compile time tests. for example we have the following Oberon source՝
MODULE tst;
IMPORT Out;
VAR i: SHORTINT;
str: ARRAY 1024 OF CHAR;
BEGIN
FOR i := 0 TO LEN(str) DO
Out.Int(i, 0); Out.Ln
END;
END tst.
lets try to compile it՝
[2020-09-02 16:00:20] $ voc -m tst.Mod
tst.Mod Compiling tst.
9: FOR i := 0 TO LEN(str) DO
^
pos 102 err 113 incompatible assignment
Module compilation failed.
[2020-09-02 16:00:21] $
voc compiles Oberon source to C, which is very good to illustrate what happens. if we have՝
str: ARRAY 16 OF CHAR;
then the output C code will be՝
static CHAR tst_str[16];
nothing more։ no field for the length.
still, this՝
FOR i := 0 TO LEN(str) DO
will translate to՝
tst_i = 0;
while (tst_i <= 16) {
as i said we know the length at compile time. doesn’t matter if you generate C or assembly, you already know the number, you can put the number to the output assembly code as well.
when we write a function which receives strings, we should not knowt the length of the received string, we just use ARRAY OF CHAR
as an argument՝
MODULE tst2;
PROCEDURE addStrs(VAR a, b: ARRAY OF CHAR);
BEGIN
(* do smth useful here *)
END addStrs;
PROCEDURE test*;
VAR
s0: ARRAY 32 OF CHAR;
s1: ARRAY 64 OF CHAR;
BEGIN
addStrs(s0, s1);
END test;
END tst2.
therefore C code will be՝
static void tst2_addStrs (CHAR *a, ADDRESS a__len, CHAR *b, ADDRESS b__len)
{
/* here we think we can do smth useful */
}
void tst2_test (void)
{
CHAR s0[32];
CHAR s1[64];
tst2_addStrs((void*)s0, 32, (void*)s1, 64);
}
as we see՝ the function also gets the length of strings. and if we do LEN(a)
we get the length without any calculations.
now let’s see how dynamic strings work՝
MODULE tst3;
PROCEDURE addStrs(VAR a, b: ARRAY OF CHAR);
BEGIN
(* do smth useful here *)
END addStrs;
PROCEDURE test*(i: INTEGER);
VAR
s: ARRAY 32 OF CHAR;
ps: POINTER TO ARRAY OF CHAR;
BEGIN
NEW(ps, i);
addStrs(s, ps^);
END test;
END tst3.
now we hase a static string՝ s and we allocate a dynamic string with its pointer ps
.
lets assume we don’t know the size of ps^
string (^
means dereference) and we will receive the length of the allocated string as a function argument. it is not known at compile time.
first function remains unchanged, second function gets translated like this՝
static void tst3_addStrs (CHAR *a, ADDRESS a__len, CHAR *b, ADDRESS b__len)
{
/* do smth useful here */
}
void tst3_test (INT16 i)
{
CHAR s[32];
struct {
ADDRESS len[1];
CHAR data[1];
} *ps = NIL;
ps = __NEWARR(NIL, 1, 1, 1, 1, ((ADDRESS)(i)));
tst3_addStrs((void*)s, 32, (void*)ps->data, ps->len[0]);
}
the _NEWARR
is a bit more complicated function, which is a part of the runtime.
but we can understand what it does՝ it allocates a space in the heap, and the pointer ps
we get now points to the struct
, which has a data
field and len
field.
this is a runtime information, and in this case we have to keep a separate field for the length of the string.
that’s it.
#oberon #c #pascal #wirth #programming #programming-languages #programming_languages #design #implementation #vishap #voc #compiler #compilation #strings #string #heap #stack #storage #storage-management #storage_management #length
Instead of a flat string, Erlang — oops, did I say Erlang? I meant Elixir — creates a nested list containing four leaf elements that add up to the string we expect: “Hello “, “&”, “amp;”, and " Goodbye”. At first glance this may seem like a pointless complication, but let’s see what the computer thinks about the situation.
If you look at the man page for writev, you’ll see it’s a “gathered write”, which means it writes data from multiple memory locations in a single system call. I wrote a little DTrace script to unpack the writev call we saw earlier, and peek at what that Elixir code is actually doing with this system call.
https://www.evanmiller.org/elixir-ram-and-the-template-of-doom.html
#erlang #elixir #dtrace #write #writev #syscall #string #strings #programming
In order to allow concatenation of strings to be efficient in both time and space, it must be possible for the result to share much of the data structure with its arguments. This implies that fully manual storage management (e.g. based on explicit malloc/free) is impractical. (It can be argued that this is true even with conventional string representations. Manual storage management typically results in much needless string copying.) Though an explicitly reference counted implementation can be built, we will assume automatic garbage collection.
https://www.cs.rit.edu/usr/local/pub/jeh/courses/QUARTERS/FP/Labs/CedarRope/rope-paper.pdf
#paper #string #strings #rope #gc #storage
Every time a new assignment or parameter passing is made, the reference count of the String has to be increased, this is an atomic lock, and is related to memory management, so it’s there whether you’re using simple reference-counting or copy-on-write.
Under a GC, no atomic lock is required, a simple reference (pointer) has to be copied. This is very efficient, locally, but the memory management costs are just deferred to a later garbage collection phase. Since immutable strings don’t have reference to other objects, the GC for them can theoretically happen in parallel without any drawbacks (assuming the GC supports it).
So under a GC, an immutable String type makes a whole lot of sense, as implementing a copy-on-write one requires a lot of effort, and a mutable one is problematic multi-threading wise.
https://www.delphitools.info/2013/05/13/immutable-strings-in-delphi/
#pascal #string #strings #immutable_string #delphi #gc #programming