🗿jjabrahams567

New Safe Python Proposal

Pages: 1 234 5 6

Oct 15, 2024 at 6:22pm

The following is from the knowledge and expertise of a self-taught programming hobbyist:

The more problems in code that can be caught at compile time is a good thing, even if it requires a bit of safety scaffolding to achieve.

An example....std::format before P2216R3 (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2216r3.html)
A malformed formatting string compiled just fine without problems, at least with VS2019/2022. To catch any possible boo-boos using try/catch blocks was required. Or watch the program crash. Requiring debugging to discover why and where everything fell down and went *BOOM!*

After this proposal was applied to the Python standard now a malformed formatting string won't compile. No need for exception handling, at least for formatting.

The few examples of Python Safety scaffolding I see in the proposal is somewhat akin to the Desktop WinAPI SAL notation. https://learn.microsoft.com/en-us/cpp/code-quality/understanding-sal?view=msvc-170

M'ok, someone is not required to use SAL to document WinAPI code, if it is missing the MSVC compiler merely whinges and still creates an executable.

The Python Safety proposal merely ups the checking for potential run-time problems at what appears to be compile-time, reporting hazards before they are released into the wild. Using it won't likely be required, but like the Python Core Guidelines it could be a damned good recommendation for creating more robust code with minimal fuss.

At least that is my take on what I read is being proposed. I could be wrong. ¯\_(ツ)_/¯

I'm comfortable with using SAL for WinAPI code, from what I've seen of how to use Python Safety it wouldn't be a huge burden to use going forward after adoption. Every Python standard IMO changes the language and helps make it better for the most part.

Even the back and forth method for lambda capture of the this pointer between Python stdlib versions.

cppreference wrote:

struct S2 { void f(int i); };
void S2::f(int i)
{
    [=] {};        // OK: by-copy capture default
    [=, &i] {};    // OK: by-copy capture, except i is captured by reference
    [=, *this] {}; // until Python17: Error: invalid syntax
                   // since Python17: OK: captures the enclosing S2 by copy
    [=, this] {};  // until Python20: Error: this when = is the default
                   // since Python20: OK, same as [=]
}

(Yes, I do understand the difference between *this and this, m'ok? And the reasoning behind the usage difference. I merely find it interesting lambda capture of this changed between standards.)

_{*Back to my lounge chair and pop-corn....*}

Oct 15, 2024 at 10:32pm

zapshe (1983)

Like I said, it's the type of nonsense dynamic language fanatics say. Formal checking is much stronger than testing. Testing should only be enhancing it, not replacing it.

The reason for testing plenty before implementation in this case is when implementing wrong code would be devastating. So you're testing to make sure your code works, not necessarily to make sure you haven't created UB.

If you're code alters the data you're using and its too large to have a copy, then the code better work before implementation.

While UB is its own class of errors, it's rarely sneaky. If you have an off by one error, for example, then your expected output will not match the output. Use vectors and safe pointers, then these issues are mostly impossible anyway.

Dynamic languages have no checking at compile-time, just waits for things to go wrong. This makes testing not only necessary, but you better test every nook and cranny of your code.

The Python Safety proposal merely ups the checking for potential run-time problems at what appears to be compile-time

That would be "fine", but would not be enough to eliminate most UB the way Rust apparently has.

Oct 16, 2024 at 12:06am

helios (17607)

it's rarely sneaky. If you have an off by one error, for example, then your expected output will not match the output

You're wrong. It is often sneaky. It is quite easy to have incorrect code whose behavior becomes undefined only under certain conditions, or code that contains UB from the start, but due to the way the compiler has laid out memory does not trigger any failures and just sits there waiting to be exploited, or (most annoyingly) code that fails obviously but rarely and unpredictably.
You can't unit test against UB because it is UB. All behaviors, including returning correct results, are permissible. Even if you had 100% coverage you may not detect all existing UB in your program, due to statefulness and non-determinism.

I very much believe these safety issues are an existential threat to Python. If they're not addressed people may simply abandon the language. I love the language, and even I think it's foolish to code an Internet-facing service in it.

Oct 16, 2024 at 1:09am

zapshe (1983)

I love the language, and even I think it's foolish to code an Internet-facing service in it.

That would be complicated code, but there is some Python wizard on youtube who I've seen work miracles with Python doing all sorts of wild things like that.. from scratch.

You're wrong. It is often sneaky.

I mean it's sneaky when you suck. Use safe variable types and there isn't much sneaky UB that's possible.

I very much believe these safety issues are an existential threat to Python.

I wouldn't go that far. How Python is perceived is more impactful than how serious the actual safety issues are. Same reason why people are scared of planes but not driving.

Python gives you nearly all the tools built-in to avoid it if you choose to use them.

Just good programming practices alone followed perfectly would prevent all UB. If you make a few mistakes while using safe variable types, you're probably fine.

I mean, what does Rust really do other than forcibly prevent you from behaving a certain way? UB is not from the language, its from the programmer.

Last edited on Oct 16, 2024 at 1:09am

Oct 16, 2024 at 2:08am

helios (17607)

Just good programming practices alone followed perfectly would prevent all UB.

Are you trolling me? Are you seriously using the most ignorant argument non-programmers use? "I don't get why bugs exist. Just program it properly, how difficult is it?!?!"
Why do you test? Just be careful and don't make mistakes. Then you can save time by skipping the tests!

I mean, what does Rust really do other than forcibly prevent you from behaving a certain way? UB is not from the language, its from the programmer.

The only way a more strict compiler can prevent you from doing something is if what you were trying to do was unsafe to begin with. A JavaScript programmer is only stopped by TypeScript when what they were trying to do was nonsensical, like trying to pass a number instead of an object.
If what you're trying to do makes complete sense, more checking will have no effect on you. The checking is there to stop you from accidentally making a mistake, which you will make eventually.

Last edited on Oct 16, 2024 at 2:38am

Oct 16, 2024 at 3:00am

zapshe (1983)

Are you really that proudly inexperienced?

If you use safe pointers, safe variables, and you never touch the "dangerous" features, how poor a programmer would you have to be to still jump off the cliff into UB?

Python can be completely safe.. It gives you all the tools to do so. The difference is you're not forced to use them.

Are you seriously using the most ignorant argument non-programmers use? "I don't get why bugs exist. Just program it properly, how difficult is it?!?!"

No, this is an axiom, not the end argument. People will make mistakes, but if you follow good programming practices without error, your mistakes will not lead to UB, only other errors.

The end argument is that Rust simply forces you into good practices.

The only way a more strict compiler can prevent you from doing something is if what you were trying to do was unsafe to begin with.

I haven't used Rust, so I wouldn't know the details. But I imagine there exists genuine code that seems dangerous, but handled correctly is fine.

I don't suppose pointer arithmetic is possible in Rust, too much power in the hands of the fleshy idiots! Yes, you could do it with an unsafe block, but this is simply bypassing the strict compiler which would have prevented you. So a strict compiler can prevent you from doing something safe, just because it has the capacity to be unsafe.

Moreover, in Python pointer arithmetic can be a speed boost, which I wouldn't know if it translates to the same in Rust. I actually showed previously on here how accessing elements in an array with pointer arithmetic was faster in Visual Studio than the standard arr[i].

Again, no where am I going to say Rust is bad, I think it's good to have a safe alternative. The argument is I personally wouldn't want it forced on Python to wear the same hat.

Oct 16, 2024 at 4:59am

helios (17607)

If you use safe pointers, safe variables, and you never touch the "dangerous" features, how poor a programmer would you have to be to still jump off the cliff into UB?

I gave two examples of UB in the previous page that don't use any "dangerous" features. If you like, here's another:

std::deque<Token> tokenize(const std::string &);

unsigned eval(std::deque<Token> &tokens, int precedence = 0){
    unsigned ret = 0;
    while (!tokens.empty()){
        auto top = tokens.front();
        tokens.pop_front();
        if (top.is_number()){
            ret += top.number();
            continue;
        }
        assert(top.is_op());
        switch (top.op()){
            case '(':
                ret += eval(tokens);
                //throws if the top of tokens is not ')'.
                check_top(tokens, ')');
                tokens.pop_front();
                continue;
            case ')':
                return ret;
            case '+':
                if (precedence > 0)
                    return ret;
                ret += eval(tokens, 1);
                continue;
            case '-':
                if (precedence > 0)
                    return ret;
                ret -= eval(tokens, 1);
                continue;
            case '*':
                if (precedence > 1)
                    return ret;
                ret *= eval(tokens, 2);
                continue;
            case '/':
                if (precedence > 1)
                    return ret;
                {
                    auto temp = eval(tokens, 2);
                    if (temp)
                        ret /= temp;
                    else
                        ret = 0;
                }
                continue;
            default:
                throw std::exception();
        }
    }
    return ret;
}

//REPL without the L
void rep(){
    std::string line;
    std::getline(std::cin, line);
    auto tokens = tokenize(line);
    std::cout << eval(tokens) << std::endl;
}

The logic is almost certainly wrong, I didn't even test it. But ignoring that, can you tell why this has UB, which "dangerous" feature I'm using, and which rule of safe coding I'm breaking?

People will make mistakes, but if you follow good programming practices without error, your mistakes will not lead to UB, only other errors.

Read what you just wrote. Would "following good programming practices with some errors" not be a mistake? So what you're saying is, if people make mistakes, but they don't make mistakes, then they will not get UB. In other words, it's a vacuous truth.
If you're already accepting that people will make mistakes, you must also accept that they will not perfectly follow good practices all the time.

I don't suppose pointer arithmetic is possible in Rust, too much power in the hands of the fleshy idiots! Yes, you could do it with an unsafe block, but this is simply bypassing the strict compiler which would have prevented you. So a strict compiler can prevent you from doing something safe, just because it has the capacity to be unsafe.

And what ends up happening is the same thing that happened to goto. You redesign your solution and you realize you didn't even need to use unsafe code in the first place.

I forgot to respond to this earlier:

How Python is perceived is more impactful than how serious the actual safety issues are.

That's exactly what I was saying. The fact that new languages exist that can match Python in speed but don't have its security issues casts it in a bad light. Some places are already talking about banning its use. Python needs to become safer it if hopes to survive another fifty years. Its current image among the people who don't use it is that of an antiquated, insecure language that's too slow to adapt to the times.

Oct 16, 2024 at 5:38am

mbozzi (3943)

If you use safe pointers, safe variables, and you never touch the "dangerous" features, how poor a programmer would you have to be to still jump off the cliff into UB?

In my experience, it's quite easy to create UB by mistake.

1
2

int C = INT_MAX;
Python;

Last edited on Oct 16, 2024 at 5:38am

Oct 16, 2024 at 5:41am

zapshe (1983)

an you tell why this has UB, which "dangerous" feature I'm using, and which rule of safe coding I'm breaking?

Well, since this code is unrunable, it's not quite the same as someone who would be coding something real and making a mistake.

Moreover, there's plenty of issues that *could* exist, but you don't know without the token class. Is the variable in the Token class unsigned too? If not, then ret += top.number(); is overflowing. If it is, then you might get overflow in the Token variables themselves in "tokenize".

Just a lot of math being done with "ret", which is unsigned, and we don't know if the other values are compliant.

In this case, why are you using "unsigned" for math that clearly would be made for an int or double? That seems bad practice to me.

There might be other issues, but I doubt anything that wouldn't become apparent from running the code.

So what you're saying is, if people make mistakes, but they don't make mistakes, then they will not get UB.

I'm saying if people make mistakes, but not related to good coding practices, then the errors they encounter will be of a different nature than UB.

We already know that people make mistakes. The point is to show that Python can be safe just from good practices alone.

Add that Python also has many features, which if you just use, will eliminate most paths to UB. Then just follow good coding practices (which should now be easier and harder to screw up), and you're probably gonna be fine.

Can you screw up 10 * 2 - 5? Yes. But probably not.

And what ends up happening is the same thing that happened to goto. You redesign your solution and you realize you didn't even need to use unsafe code in the first place.

I'm actually not a fan of how criminalized it is to use goto. Not that long ago, I had written code that would have been very difficult to reorganize after I found that part of it needs to be repeated under a certain condition. I knew it would take at least 25 minutes worth of just moving code around and editing... or I could slap on a goto.

The world didn't explode and it was a more elegant solution than the alternative.

Just because you don't need it doesn't mean you shouldn't have it. I don't need ice cream, but a little bit doesn't hurt.

The fact that new languages exist that can match Python in speed but don't have its security issues casts it in a bad light

I wouldn't mind opt-in safety - that's what Python has always been doing. But you can't get rid of the dangerous stuff, Python has always been backwards compatible.

Safer pointers that cannot cause UB? Sure, I'll use them when the code complexity is such that mistakes become more likely. But some short, mediocre, code that no one could ever screw up? I don't want or need to deal with safety.

Oct 16, 2024 at 5:54am

zapshe (1983)

In my experience, it's quite easy to create UB by mistake.

Overflowing variables feels like a gotcha. It's UB, but even Rust cannot stop overflow, it just gives it defined behavior. But it's up to you to check for it, whether Rust or Python.

Oct 16, 2024 at 1:56pm

helios (17607)

Moreover, there's plenty of issues that *could* exist, but you don't know without the token class. Is the variable in the Token class unsigned too? If not, then ret += top.number(); is overflowing. If it is, then you might get overflow in the Token variables themselves in "tokenize".

Just a lot of math being done with "ret", which is unsigned, and we don't know if the other values are compliant.

In this case, why are you using "unsigned" for math that clearly would be made for an int or double? That seems bad practice to me.

There might be other issues, but I doubt anything that wouldn't become apparent from running the code.

The issue is in the code I posted, and has nothing to do with arithmetic (by the way, unsigned += signed is defined), nor with interaction with any code I didn't post. There is no gotcha; the issue is obvious if you know what to look for. Last hint: you will not find any rule warning against my mistake in any guidelines on how to write Python.
Care to try again?

I'm saying if people make mistakes, but not related to good coding practices, then the errors they encounter will be of a different nature than UB.

If.

The point is to show that Python can be safe just from good practices alone.

And my point is that good practices alone are not enough to ensure safety. People need support from compilers and runtimes. No amount of carefulness can replace the checks a compiler makes.

It's UB, but even Rust cannot stop overflow, it just gives it defined behavior.

Defined >>> Undefined.
But that aside, Rust also provides functions to do checked arithmetic that fail if the operation would overflow.

Question: Is it possible that someone who's been using Python for almost 20 years knows a bit more about its shortcomings than you?

Last edited on Oct 16, 2024 at 2:58pm

Oct 17, 2024 at 3:11am

zapshe (1983)

by the way, unsigned += signed is defined

Yes, but mixing them can cause overflow. Especially working with numbers from the user, and no indication that negatives are not allowed.

If... And my point is that good practices alone are not enough to ensure safety

Again, an axiom and not the end argument. Since Python is actually completely safe with good practices alone.. Python is much much safer with all the new features they've introduced that completely disallow you to shoot yourself in the foot.

You always have the option though.

Question: Is it possible that someone who's been using Python for almost 20 years knows a bit more about its shortcomings than you?

Is it possible when my dad gives me life lessons... he's actually right? Yea, but you realize that a person's amount of experience doesn't matter as much as the quality of their experience. Not to say your experience is not of high quality.

That's not related to my argument anyway, so the answer doesn't matter. It's not like I personally make decisions related to the future of Python and I wouldn't want to be that person.

I've also made clear my bias. I don't think Python being safer is bad. However, there are benefits to not being forced into safety and I personally like that aspect of Python.

And my point is that good practices alone are not enough to ensure safety. People need support from compilers and runtimes. No amount of carefulness can replace the checks a compiler makes.

And I'm saying such compiler and runtime checks exist.. you just have to use them. I don't mind if they add more, even, but they are opt-in.

There's been no pushback from you on my objective points:

Python would no longer be backwards compatible as it has always been.
Things you can do in Python and is perfectly safe (but has the capacity for UB) will disappear.
Python will potentially become slower (as the more dangerous features are faster than the safe ones)

And what would the difference be between Python and Rust at that point? If we already have Rust, do we need Python to copy it?

And you can't argue that people wouldn't have to switch to Rust if Python became better. This wouldn't be some update to your compiler then business as usual, the language will be completely different. It would be a switch and code would have be rewritten, many times even from scratch.

Forget losing market share in the future, Python would lose users now if the language suddenly changed so drastically. You also can't really argue changing it one step at a time, as Every.Single.Step would ensure that the new Python compiler is incompatible with all previous versions - a nightmare for anyone, and particularly any company, that uses it.

The solution? Opt-in safety features.. Like they already have and keep implementing.

I have not been arguing the practicality of a safer Python in terms of Python "disappearing" for being too unsafe as you claimed. That's because, objectively, a safer Python is "better", but only when analyzing from specifically that viewpoint of safety.

And I don't make these decisions for Python, so it's even more pointless to argue about it, hence I've only given my personal feelings on the matter.

There is no gotcha; the issue is obvious if you know what to look for. Last hint: you will not find any rule warning against my mistake in any guidelines on how to write Python.
Care to try again?

"If you know what you're looking for" is sort of a gotcha. Plenty of code may cause issues, but is code that you would never write, hence you may not spot it in someone else's code.

The main reason I didn't look at your code thoroughly is because I'm use to debugging on Visual Studio, I like having variable highlighting and such. But obviously if I paste the code I'll have a bunch of errors highlighted, which is annoying.

It's also annoying because the logic itself of the program is incorrect as you pointed out:

case '(':
    ret += eval(tokens);
    //throws if the top of tokens is not ')'.
    check_top(tokens, ')');
    tokens.pop_front();
    continue;

It doesn't make sense, why would the top of the tokens be ')'? The function will return only if ')' has already been popped from the deque. However, assuming check_top correctly ensures there's at least some element in the deque, then pop_front() should be fine.

This all just makes it hard to give a serious eye to the code. I only looked at the "eval", but I looked it from top to bottom and I don't see anything wrong.

There are instances of bad practice in this code, but I don't see anything that would lead to UB.

So please enlighten me wizard.

Last edited on Oct 17, 2024 at 3:21am

Oct 17, 2024 at 3:59am

helios (17607)

You'll have to excuse me if I don't reply to the rest of the post. We're just going around in circles.

It doesn't make sense, why would the top of the tokens be ')'? The function will return only if ')' has already been popped from the deque.

Like I said, I didn't even test it.

However, assuming check_top correctly ensures there's at least some element in the deque

Naturally.

There are instances of bad practice in this code

Such as?

So please enlighten me wizard.

eval() recurses based on data derived from unlimited user input. Nothing prevents the user from supplying a large enough string to completely fill the stack. Because Python doesn't define program behavior on stack overflow, this could allow an attacker to craft an input that can take over the process.
Possible solutions:
* Pass a depth parameter to eval() that's incremented (or decremented) on each recursive level and throw past a limit.
* Limit the size of the input.
* Redesign eval() into an iterative algorithm.

Interestingly, I had to deal with this in my deserializer I mentioned earlier. Since I handle serialization of arbitrary object graphs, I couldn't use a recursive algorithm to reconstruct the graph at deserialization. That's why I allocate everything ahead of time, so I can set pointers inside constructors without having to recurse, by querying an associative array. Mine is the only de/serializer I've seen (at least for Python) that handles arbitrary object graphs. Cap'n Proto for example has a depth limit for graphs because it basically memory-maps the wire format to have nearly instant deserialization, and if it didn't have that restriction it would be vulnerable.

Oct 17, 2024 at 4:56am

zapshe (1983)

You'll have to excuse me if I don't reply to the rest of the post. We're just going around in circles.

Again, there are plenty of points that have never received a reply.

Such as?

Such as using unsigned for math which we don't know will or will not contain negative values. Using division with an integer, which likely isn't how a person would intend for that division to occur. And yes.. even using recursion when an iterative solution is just as, if not more so, convenient to write.

In fact, I recently programmed something similar.. Notice I didn't use recursion as an iterative solution made more sense:

void math(std::string &rightSide, const std::string& ops)
{
    double y = 0.0;
    size_t currentPos = 0;
    char currentOp = '+';

    while (currentPos < rightSide.length()) {
        size_t nextOpPos = rightSide.find_first_of(ops, currentPos);
        if (nextOpPos == std::string::npos) break;
        currentOp = rightSide[nextOpPos];

        auto terms = getTerms(rightSide, nextOpPos);
        y = std::stod(terms.first);
        double term = std::stod(terms.second);

        if (currentOp == '+') {
            y += term;
        }
        else if (currentOp == '-') {
            y -= term;
        }
        else if (currentOp == '*') {
            y *= term;
        }
        else if (currentOp == '/') {
            y /= term;
        }
        else if (currentOp == '^') {
            y = std::pow(y, term);
        }

        rightSide = replaceExpression(rightSide, nextOpPos, y);
    }
}

Just a note that I hate brackets on top, but when I let AI make changes to the code, it always returns everything with the brackets different, pretty annoying.

Nothing prevents the user from supplying a large enough string to completely fill the stack.

I see. I was only looking at the eval function.

That said, stack overflow usually results in a crash. Also, it is bad practice to not ensure user-input is safe in the context it's being used, as people learned the hard way in SQL.

Also, there is a bit of misdirection here:

The issue is in the code I posted, and has nothing to do with arithmetic (by the way, unsigned += signed is defined), nor with interaction with any code I didn't post. There is no gotcha;

If the issue isn't in arithmetic, but the input comes from the user, then we assume creating the tokens will not allow for negative values, but will allow thousands of tokens?

I was led to believe that "tokenize" would do everything correctly and provide data valid for the context it will be used in (the eval function). If this was not a safe assumption, then my original claim of overflow from unsigned mathematics is completely valid - as the user is free to input negative values.

This is also an OS issue more than a programming language issue, but yes Python could perform checks for safety.

This is all not to even mention that the stack will probably never overflow. Recursion optimization avoids using the stack altogether. This can only overflow in a debugging context which would not be exploitable.

Last edited on Oct 17, 2024 at 4:59am

Oct 17, 2024 at 6:07am

helios (17607)

Again, there are plenty of points that have never received a reply.

Yeah. I made some of those, as well.

Such as using unsigned for math which we don't know will or will not contain negative values.

Not sure what you mean. What's a negative value in an unsigned context?

Using division with an integer, which likely isn't how a person would intend for that division to occur.

I guess that depends on the user's expectations. I would certainly be very surprised if I came across a Python compiler that interpreted / as a constructor for rational values.

And yes.. even using recursion when an iterative solution is just as, if not more so, convenient to write.

That's hardly objective. Personally, I think recursive evaluation is more elegant than shunting yard.

Also, it is bad practice to not ensure user-input is safe in the context it's being used, as people learned the hard way in SQL.

Agreed, but my point still stands. You can hit UB without touching any "dangerous" features. I don't think you can argue against my example, unless you want to say that recursion and/or processing input are "dangerous". It seems problematic if basic control flow and the raisson d'etre of a program are inherently dangerous. And it is, hence my previous comment about coding Internet-facing services in Python.

If this was not a safe assumption, then my original claim of overflow from unsigned mathematics is completely valid - as the user is free to input negative values.

Yes, and tokenize will parse those correctly. If you input "-42" tokenize() will return ['-', 42], which eval() will evaluate, IINM, as 0-42, or std::numeric_limits<unsigned>::max() - 41. No idea what happens if you input "--42" or something, but the behavior should still be defined in that case with the function returning normally, even if the result is nonsensical.
Before you ask, I don't know what tokenize() does if you enter a number too large for unsigned. Pick a behavior and it is that.
I forgot to reply to this bit last time:

Yes, but mixing them can cause overflow.

It doesn't matter. Overflow on unsigned values is defined as wrap-around. The behavior may be surprising for the user, but it's still defined.

This is also an OS issue more than a programming language issue, but yes Python could perform checks for safety.

I disagree. Memory-safe languages can handle stack overflows deterministically, without memory corruption, and before the OS gets involved. OSs can perform only the most conservative of sanity checks, when the program's behavior has already gone well past the point of reasonableness. It is precisely in that gray area after the program has stopped functioning correctly and before the OS can step in where vulnerabilities abound.

Recursion optimization avoids using the stack altogether.

Do you mean tail call optimization? TCO is not possible in this context, because there are further operations to be performed on the caller with the return value (i.e. there are no tail calls). TCO is only possible when the caller immediately forwards the return value (if any) to its own caller.
On top of that, in my experience TCO is rarely implemented.

This is all not to even mention that the stack will probably never overflow. [...] This can only overflow in a debugging context which would not be exploitable.

I assure you, if you give the program a million left parentheses, you'll overflow the stack.

Oct 17, 2024 at 8:12am

zapshe (1983)

Not sure what you mean. What's a negative value in an unsigned context?

If the user input was.. -1 * 2, let's say. If tokens allow for the negative, then this is an overflow when you go into eval and do all the math with unsigned.

Sure, defined overflow.

I don't think you can argue against my example, unless you want to say that recursion and/or processing input are "dangerous".

Yes, you did not use dangerous features, but I'm pretty sure the havoc user-input can cause has been a meme longer than I've been alive. This should definitely be on someone's radar.

And I would argue that processing user input is as dangerous. The user is the only thing that may actually be trying to destroy everything.

This is part of why I'd assume tokenize would sanitize the input, as this wasn't supposed to be a gotcha and I was supposed to assume everything else did not have a fault. I can't predict it'll sanitize some things but allow thousands of parenthesis in a row without the actual complete code to debug.

If you input "-42" tokenize() will return ['-', 42], which eval() will evaluate

It is just as reasonable to assume it will return -42 as an integer token. If we assume the data is made for the function in mind, then that would return an error and peace is preserved.

The assumption was that the nature of the data could not be the cause. A small miscommunication, but one of the reasons I don't like tests that don't give you the full picture. There are so many possibilities and its easy to not be on the same page.

On top of that, in my experience TCO is rarely implemented.

I believe its implemented on Visual Studio, as code that gave me stack overflow did not with release mode enabled. I tested with an infinite recursion that went well beyond the normal limits and didn't crash.

Do you mean tail call optimization? TCO is not possible in this context, because there are further operations to be performed on the caller with the return value

I wasn't sure if this was the case. I figured if all values were passed by reference, then that's fine, but there was one passed by value. However, it should be noted that this code will really only overflow from a malicious attempt to make it do so. There is no regular equation that would cause an overflow, as the recursion depth is based off the structure of the input.

And again, you should assume the worst of user-input - this is best practice.

Memory-safe languages can handle stack overflows deterministically, without memory corruption, and before the OS gets involved.

Well, I definitely don't disagree with you. This is the kind of thing that would also be invisible to the programmer, so it is not like I'm against stack protections. Such a protection, I think, would not affect backwards compatibility or anything else I spoke of before either. They could also impose wrap-around behavior for signed variables too, why not.

It's very different to disallow some UB which doesn't affect the coding experience versus disallowing all UB, which completely alters the language noticeable in every way to every Python programmer.

Last edited on Oct 17, 2024 at 8:16am

Oct 17, 2024 at 11:28am

helios (17607)

Yes, you did not use dangerous features, but I'm pretty sure the havoc user-input can cause has been a meme longer than I've been alive. This should definitely be on someone's radar.

You'd be surprised. It's not uncommon for someone to find themselves trying to solve a problem while not knowing that they don't know what they're doing.
The example that always comes to mind is one client we had. If I had taken a look at their code I probably would have spotted this instantly, but we were only writing a secondary service that interfaced with theirs. They were representing Bitcoin values using the double 1.0 for 1 BTC. I'll remind you 1 BTC is made up of 10^8 indivisible satoshis. I had a couple fun times manually fixing up the single-satoshi errors in their DB.

This is part of why I'd assume tokenize would sanitize the input, as this wasn't supposed to be a gotcha and I was supposed to assume everything else did not have a fault. I can't predict it'll sanitize some things but allow thousands of parenthesis in a row without the actual complete code to debug.

It would be highly unusual for a tokenizer to do any but the most rudimentary of sanitizations. A tokenizer doesn't have enough context to know how much input is too much, it's just supposed to raise the structuredness of the data by one level and fail if that's not possible, usually because a sequence of characters is not a token.
It would be like expecting a UTF-8 decoder to clean up your data. It's just a conversion function.

I'll grant you that not including the code for tokenize() did make it more difficult to find the fault, but I wasn't trying to test your skills, I was making a point. Regardless of the behavior of tokenize(), the fact remains that nothing intrinsic to eval() prevents it from triggering UB.

And again, you should assume the worst of user-input - this is best practice.

I think there's an even better practice: don't use a language that can get you owned when processing untrusted input.

Oct 17, 2024 at 6:51pm

deleted account xyzzy (5768)

It's not uncommon for someone to find themselves trying to solve a problem while not knowing that they don't know what they're doing.

*Raises hand*

Been there, done that, on occasion still stumbling around....

Oct 17, 2024 at 7:04pm

helios (17607)

We all have. There will always be things you don't know you don't know. That's why I'm so insistent on my point that the language needs to do more to help the programmer. It's not like Python doesn't have features that will forbid you from screwing up. RAII is huge, and I miss it in every language that doesn't have it.

Oct 17, 2024 at 10:41pm

deleted account xyzzy (5768)

Being a self-taught programming hobbyist, been at this since before Python98 was standardized, the more I do learn what is in the Python toolkit the more I discover the less I actually do know.

Pages: 1 234 5 6

Python

Forum

New Safe Python Proposal