This is the 'PHP Internals News' podcast, where we discuss the latest PHP news, implementations, and issues with PHP internals developers and other guests.
Similar Podcasts
The Cynical Developer
A UK based Technology and Software Developer Podcast that helps you to improve your development knowledge and career,
through explaining the latest and greatest in development technology and providing you with what you need to succeed as a developer.
Elixir Outlaws
Elixir Outlaws is an informal discussion about interesting things happening in Elixir. Our goal is to capture the spirit of a conference hallway discussion in a podcast.
ThunderCast
An inside look at the making of Mozilla Thunderbird, and community-driven conversations with our friends in the open-source software space.
PHP Internals News: Episode 103: Disjunctive Normal Form (DNF) Types
PHP Internals News: Episode 103: Disjunctive Normal Form (DNF) Types London, UK Friday, June 24th 2022, 09:07 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Disjunctive Normal Form Types" RFC that he has proposed with Larry Garfield. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 103. Today I'm talking with George Peter Banyard again, this time about a disjunctive normal form types RFC, or DNF, for short, which he's proposing together with Larry Garfield. George Peter, would you please introduce yourself? George Peter Banyard 0:39 Hello, my name is George Peter Banyard, I work on PHP paid part time, by the PHP foundation. Derick Rethans 0:44 Just like last time, we are still got colleagues. George Peter Banyard 0:46 Yes, we are indeed still call it. Derick Rethans 0:48 What is this RFC about? What is it trying to solve? George Peter Banyard 0:52 The problems of this RFC is to be able to mix intersection and union types together. Last year, when intersection types were added to PHP, they were explicitly disallowed to be used with Union types. Because: a) mental framework, b) implementation complexity, because intersection types were already complicated on their own, to try to get them to work with Union types was kind of a big step. So it was done in chunks. And this is the second part of the chunk, being able to use it with Union types in a specific way. Derick Rethans 1:25 What is the specific way? George Peter Banyard 1:27 The specific way is where the disjoint normal form thing comes into play. So the joint normal form just means it's a normalized form of the type, where it's unions of intersections. The reason for that it helps the engine be able to like handle all of the various parts it needs to do, because at one point, it would need to normalize the type anyway. And we currently is just forced on to the developer because it makes the implementation easier. And probably also the source code, it's easier to read. Derick Rethans 1:54 When you say, forcing it up on a developer to check out you basically mean that PHP won't try to normalize any types, but instead throws a compilation error? George Peter Banyard 2:05 Exactly. It's, it's the job of the developer to do the normalization step. The normalization step is pretty easy, because I don't expect people to do too many stuff as intersection types. But as can always be done as a future scope of like adding a normalization step, then you get into the issues of like, maybe not having deterministic code, because normalization steps can take very, very long, and you can't necessarily prove that it will terminate, which is not a great situation to be in. Imagine just having PHP not running at all, because it's stuck in an infinite loop trying to normalize the format. It's just like, oh, I can't compile Derick Rethans 2:39 Would a potential type alias kind of syntax help with that? George Peter Banyard 2:44 Maybe, I'm not really sure. Actually reading like research about it from computer scientists, in functional programming languages, which is everything is compiled on my head. And they have the whole thing was like, well, they need to type type normalize, and especially with type aliases, they haven't really figured out a way yet. So I'm not sure how we are going to figure out a way if experts and PhD students and researchers haven't really figured out a way. Derick Rethans 3:08 And is the reason for that mostly, because PHP, resolves types while it is running code sometimes because it has to overload classes, and then it might find out it is an inherited class, for example? George Peter Banyard 3:19 Yes, I think it's like this weird thing where might maybe PHP has like kind of an advantage, because it doesn't need to, like resolve all of the types at once. And if you have a type alias, it's just oh, if it's used, and you just need to resolve it, and then try to figure it out. There's also the added complexity of like, variance checks, because most functional programming languages, they have variance to some degree, but they don't have the whole inheritance of like typical OOP languages have. It's kind of a very strange field, the fact that yeah, PHP is just like, well, we kind of do stuff at runtime, and you don't necessarily need everything. And it just works is like, well, we'll do. That's mainly the reason why the dev needs to do the normalization step, the form is done. It's also I think, the most easiest to understand, it's just like, Oh, you have this and this, or this group, or stuff, or this group of stuff, or this thing, simple type. The other form would be another normalized form would be conjunctive normal form, which is a list of ANDs of ORs to just have this thing, or X, like (A or B or C) and X and (Y or Z), which I think is harder to understand. Derick Rethans 4:26 What is the exact syntax then? George Peter Banyard 4:28 So the exact syntax is, if you want to have an intersection type was in a union type, you need to like bracket it by parentheses. And then you have like the normal pipe union operator and you can mix it with single types, you can mix it with true, you can mix it with false, which are literal types, which now exist, or just normal, bool types. Derick Rethans 4:48 The parenthesis is actually required. You don't rely on operator precedence to make things work? George Peter Banyard 4:53 Yes. Relying on operator precedence is terrible. Derick Rethans 4:57 Yep, I agree. George Peter Banyard 4:58 I'd say Oh, yeah, but I think I've heard this argument on the list like a couple of times, it's just, oh, yeah, but maths, like, has like, and as priority over like, or, I mean, I did three years of a maths degree and not gonna lie. Maths notation is terrible for most of us. People don't even agree on terminology. I'm just gonna say, let's, let's just do better. Derick Rethans 5:19 I agree. I mean, most coding standards for any sort of variable for like conditions, will already require parenthesis around multiple complex clauses anyway, right? I mean, it's a sensible thing to do, just for readability, in my opinion. So the RFC also talks about a few syntax that you aren't allowed to do, and that you have to normalize or deconstruct yourself, what kinds of things are these? George Peter Banyard 5:41 if you would want to have a type which has an intersection of a class A with at least one other class, so let's say X or Y, but you can always convert it into DNF form, how this type would be, it would be (A and X) or (A and Y). This seems to be the more unusual case, I would imagine. One of the motivating cases of DNF types is to do something like Array or (Traversable and Countable). I don't really see mixing and matching various different object interfaces in differencing, the most useful user land cases to be able to do Array or (Traversable and Countable) so that you can use just count or seeing something as an array, or you have like Traversable and Countable and ArrayAccess. And it's just like, Oh, here's an object, which kind of behaves like an array. Derick Rethans 6:32 I think there's currently another RFC just being proposed, that extends iterator_to_array to multiple types as well to accept more things. So that sort of fits into this category of things to do with iterables and traversals then I suppose. George Peter Banyard 6:49 yeah Derick Rethans 6:50 I'm hoping to talk to the author of that RFC as well. At the moment where two and a half weeks or so before a feature freeze, you now see a whole flurry of RFCs while it was a bit quiet in the last few months. So because you're adding to the type system, that's also usually has consequences for variance rules, or rather, how inheriting works with return types and argument types, as well as property types. What do DNF types mean for these variance checks? George Peter Banyard 7:19 The variance is checks, kind of follow the similar rules as before. So property types are easy. They are invariant, so you can't change them. You can reorder types, like was in your union if you want to. But that was already the case with Union types previously, because PHP will just check that, well, the types match. So contravariant, you can always restrict types, meaning you can either add intersections, or you can remove unions, broadly speaking. What you could do, for example, if you have like A or B or C, you could do A and X as a subtype, because you're restricting A to be of an extra, like an extra interface. Derick Rethans 8:06 So then you will have (A and X) or B or C. George Peter Banyard 8:10 Yes. So that's one restriction. You can add how many interfaces you want and do an intersection type, you can add them on every type you can. On the other side, you can just add like unions. So if for contravariance, or like an an argument type, it's like, well, I just want to return something new, well, then you can add unions, but you can't add an intersection to a type, you can only widen types of arguments. So if your type is A or B or C, you can't do A and B, and you can't do (A and X) or B or C, because you're restricting the type. If your type would be (A and X) or (B and Y) or (C and Z), then you could lift the restriction to A or B or (C and Z) because you loosening the requirements on on the type that you're accepting. Derick Rethans 8:55 To summarize this: argument types, you can always widen; return types you can only restrict, and, and property types you can't change at all. I specifically wanted to summarize that because I always find contravariance and covariance. These names confuse me. So that's why I prefer to talk about widening types and restricting types instead. Because there are so close together for me. We spoke a little bit about redundant types. What is this new functionality do if you specify redundant types? George Peter Banyard 9:30 Redundant types how they currently work in PHP are done at compile time. And they do exact class matches or constant class aliasing matches. Derick Rethans 9:41 That will need an explanation. George Peter Banyard 9:44 Class names and interface names in PHP are case insensitive. So you can write a lower-case a or upper-case A and it means the same class. If you provide let's say lower-case a or upper-case A, the engine realize this, this is the same class, so we'll serve it on the type error. So PHP has use statements, or use as. So these are compile time aliases. If you define a class A, and then you say use A as B. So B is a compile time alias of A. And then you do a type which has A or B, PHP already knows these things refer to the same class. So it will raise a compile time error. Derick Rethans 10:25 These use aliases are per file only, right? George Peter Banyard 10:28 Yes, that's usually to do with if you import traits or like a namespaces. And you get conflicting class names. That's how you handle it about. PHP has also this feature, which you can do this at runtime, using the function called class_alias. Now, obviously, compile time checks are done at compile time. So it doesn't know at runtime that you aliasing these classes or using this name as an alias. So then PHP won't complain. Derick Rethans 10:53 But will don't complain during runtime. George Peter Banyard 10:56 No. Derick Rethans 10:56 You really just wanted to shoot yourself in the foot, we'll let you do this. George Peter Banyard 11:00 Yet, during this at runtime, just as like a whole layer of time, because it's not it's not really useful. Basically, what it means that PHP won't guarantee you the type is minimal. I.e. you might have redundant types, but it will just try to tell you, it's like oh, the- these are exactly the same types. And I know these are the same types, you probably do get mistake. So if it can determine this at compile time, it will tell you. Derick Rethans 11:23 The variance is still checked when you're passing in things. George Peter Banyard 11:26 Yes, so variance is checked on inheritance. When the class is inherited and compiled, because it needs to load the parent class, it will then check that it's built properly, and otherwise it will raise an error, that's fine. But just checking that the types is minimal is not possible. A) because inheritance, you don't know how it works, because it will only do the checks on basically on the name of the strings, it will do like compare strings of class names. And if it doesn't know the class name, or if it or if it needs to do some inheritance, it just won't do an instance of check. They just ignore that. It's just like, well, maybe it is maybe it's not I don't know. And that's fine. Derick Rethans 12:08 Of course, if you pass in a wrong type at runtime, then it will still get rejected during runtime anyway. George Peter Banyard 12:14 Yes, that hasn't changed. Derick Rethans 12:16 The only thing that you might end up in a situation where you don't get warned during compile time whether type is redundant. George Peter Banyard 12:23 Yes. So that's the behaviour we currently are the behaviour is added. So, it will check that two intersection types within the union are identical using the same class stuff. So for example, if you have class A, and you say use a as B, and then you have a type which is (A and X) or (B and X), it will tell you: Okay, these classes are the same. The check it adds now also it will check that you don't have a more restrictive type with a wider type. So if your type is T or (T and X), because T is wider than T and X, it will error at compile time, it'll tell you well, T is less restrictive than T and X. So the T and X type is redundant. Derick Rethans 13:11 Okay, so nothing strange. Basically, what you expect to happen will happen. And PHP does its best telling you at compile time whether you've done something wrong or not. George Peter Banyard 13:22 Yes. Derick Rethans 13:24 I think we've spoken mostly about the functionality itself and types. I'm a little bit interested in whether you encountered some interesting things while implementing this feature. George Peter Banyard 13:33 This feature basically, was a bit in limbo for the implementation, because I was waiting on a change to make Iterable, a compile time alias of Array or Traversable, which shouldn't affect userland. Because previously, all of the checks needed to cater to if you get Iterable, then you need to check for the variance. Has it Array , has it a Traversable type, does this accept? Is it why the is it more restrictive, it's identical. It's just this weird edge case, which makes the variance code harder. Moving this to a compile time alias, where now it just uses the standard, a standard union type in some sense, just makes a lot of the variance checks already streamlined and simpler. And because this is simpler, in some sense, was DNF types. When you hit the intersection, you need to recurse one step to check the variance. This helps. This is also kind of why DNF types are enforced like as like the structure on the dev because otherwise, you could potentially get into the whole like, oh, infinite recursion if you do like very nested types, because it's just like, oh, you hit one nested type and so, oh okay, now I'm again in unnecessary time and then you recurse again and then you recurse again, and so that's all you get into the thing: Oh you need to normalize the type. The variance check is: Can you see if it's a union type is the first type a sub list So a list of intersection types, okay, is it balanced? And then just recall the same function in some sense, like, check the types for variance, is this correct? Okay, move to the next type back into the Union and everything. So the implementation is conceptually simple, because all of the implementation details already exist. And all the everything hard has already been done. Now, it's just like, in some sense, it was extracting it into its own function, and then like recurse into it, and not forget to update opcache properly. Derick Rethans 15:31 You mentioned that in order to make the DNF types work, you were waiting on this Array or Iterable or Traversable kind of type. Is this also type people can use it and userland? Or is it internal only? George Peter Banyard 15:44 It is the standard Iterable type that you can already use. So currently, PHP considered Iterable, a full type in some sense. And what the this implementation change basically makes it Iterable into ... compile time alias of Array or Traversable. Iterable exists since, PHP, 7.1, I think. Can still use it, reflection should still be fine if you use it as a single type. Derick Rethans 16:08 So to change there is more, instead of: if you encounter Iterable, we check for both Array and Traversable. Then, instead of making the check every time you look at Iterable is already part of the type system, so you don't have to make the check every time. George Peter Banyard 16:23 Exactly, you basically move when it's being transformed in some sense. Now it has some repercussion on other parts, which needed to be taken care of, which is probably why it was in limbo for 10 months. I had already done the implementation of DNF types, basically, working on my local copy of that branch. It's just like: Okay, this got merged, nice, I can now open the PR onto PHP SRC. So I didn't wait for it to land until start working on it. Derick Rethans 16:50 Things like that also often affect reflection, because you're adding more complex types to the type system. So what kind of changes does that make to PHP's reflection system? And does this end up breaking backwards compatibility? George Peter Banyard 17:04 So in theory, no, it doesn't. How the reflection API works around the type system is that most method calls will turn a reflection type interface, ReflectionNameType, ReflectionUnionType, and ReflectionIntersectionType, are all instances of a ReflectionType. And methods if you would call on the list. So on a union type, the type it would return if you get like getTypes is a ReflectionType. The type system and how the reflection idea was designed, there is no BC break. How the standard was working, it's like, Oh, if you had like a union type, or an intersection type, if you call the getList or getListOfTypes, or getTypes, I don't remember exactly what the method name is actually called, you will always get an array of reflection name types, because you can only have like one level of list in some sense. However, now, if your top type is a union type, then if you get getTypes, you might get an array of ReflectionNameTypes with ReflectionIntersectionTypes. So that's the case that you now need to cater to. So if you get another ReflectionIntersectionType in between. There, you could only have ReflectionNameTypes, there was no nesting, whereas now if you have a union type, one of the types that you get back from the getTypes method in the array will be a ReflectionIntersectionType. Technically, all of the types of the part of the reflection type, so it's an array of reflection types that you get. How it worked before is that you didn't need to care about this distinction between: Oh, it returns a ReflectionType and a ReflectionNameType because well, it only return a ReflectionNameType. But now this is not the case. So you now need to cater to that that oh, you might have nesting. Which kind of boils down to like if in the future, we decide to like have oh, you can nest union types in an intersection type, then the getTypes method might return a union type with other name types. Derick Rethans 19:03 You just need to make sure that you check for more than just one thing that it previously would have done. You can't assume not everything is a ReflectionType any more. It could also be ReflecionIntersectionType. George Peter Banyard 19:18 Yes, exactly. Derick Rethans 19:20 I think that sort of what's in the RFC, is there any future scope? George Peter Banyard 19:25 I mean, the future scope is type alias. As usual. Everything I feel when you talk about the type system, it's like type aliases. At one point when your types gets very complicated. It would be nice to just be able to refer this as a as a named type in some sense, instead of needing to retype every time the whole union slash intersection of it. Hopefully we can get this running for 8.3. We are starting to get kind of complicated types. It would be nice being able to have this feature. The other obvious future scope in some sense, who knows if it's actually desirable is to allow either having conjunctive normal form so you can have like a list of ANDs or ORs Derick Rethans 20:05 You call these conjunctive normal forms? George Peter Banyard 20:08 Yes. Or just a type, which is not normalized. Not sure if it's really desirable to have this feature, because then you get into the whole thing of, if PHP doesn't, either PHP doesn't know how to like normalize it, or it's not in the best form, and then you get into like, very long compilation units or just checking. It's like, okay, does it respect the type? Does it do all of the instance of checks? And I'm not sure if it's super desirable. Derick Rethans 20:38 So it could be considered future scope. But from what I gather from you, you don't actually know what it is actually a desirable thing to add to the language? George Peter Banyard 20:46 Yes. Derick Rethans 20:47 Okay, George, thank you for taking the time this morning to talk about this new DNF types RFC. George Peter Banyard 20:54 Thank you for having me. As always. Derick Rethans 20:59 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Disjunctive Normal Form Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 102: Add True Type
PHP Internals News: Episode 102: Add True Type London, UK Thursday, June 2nd 2022, 09:06 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Add True Type" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:00 Hi I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 102. Today I'm talking with George Peter Banyard about the Add True Type RFC that he's proposing. Hello George Peter, would you please introduce yourself? George Peter Banyard 0:33 Hello, my name is George Peter Banyard, I work part time for the PHP Foundation. And I work on the documentation. Derick Rethans 0:40 Very well. We're co workers really aren't we? George Peter Banyard 0:43 Yes, indeed, we all co workers. Derick Rethans 0:45 Excellent. We spoke in the past about related RFCs. I remember, which one was that again? George Peter Banyard 0:51 Making null and false stand alone types Derick Rethans 0:53 That's the one I was thinking of him. But what is this RFC about? George Peter Banyard 0:56 So this RFC is about adding true as a single type. So we have false, which is one part of the Boolean type, but we don't have true. Now the reasons for that are a bit like historical in some sense, although it's only from PHP 8.0. So talking about something historical. When it's only a year ago, it's a bit weird. The main reason was that like PHP has many internal functions, which return false on failure. So that was a reason to include it in the Union types RFC, so that we could probably document these types because I know it would be like, string and Boolean when it could only return false and never true. So which is a bit pointless and misleading, so that was the point of adding false. And this statement didn't apply to true for the most part. With PHP 8, we did a lot of warning to value error promotions, or type error promotions, and a lot of cases where a lot of functions which used to return false, stopped returning false, and they would throw an exception instead. These functions now always return true, but we can't type them as true because we don't have it, and have so they are typed as bool, which is kind of also misleading in the same sense, with the union type is like, well, it only returns false. So no point using the boolean, but these functions always return true. But if you look at the type signature, you can see like, well, I need to cater to the case where the returns true and when returns false. Derick Rethans 2:19 Do they return true or throw an exception? George Peter Banyard 2:22 Yeah, so they either return true, or they either throw an exception. If you would design these functions from scratch, you would make them void, but legacy... and we did, I know it was like PHP 8.0, we did change a couple of functions from true to void. But then you get into these weird shenanigans where like, if you use the return value of the function in a in an if statement, null gets because in PHP, any function does return a value, even a void function, which returns null. Null gets coerced to false. So you now get like, basically a BC break, which you can't really? Yeah, we did a bit and then probably we sort of, it's probably a bad idea. That's also the point of like, making choices, things that are static analysers can be like, more informants being like, Okay, your if statement is kind of pointless here. Derick Rethans 3:06 Yeah, you don't want to end up breaking BC. Now, we already had false and bool, you're adding true to this. How does that work with Union types? Can you make a union type of true or false? George Peter Banyard 3:18 No. So there are two reasons mainly. A. true and false is the same as like boolean, which is like just use Boolean in this case. But you can say, well, it's more specific, so just allow it. So that's would be reasonable. But the problem is, false has different semantics than boolean. False does not coerce values. So it only accepts false as a literal value. Whereas boolean, if you're not in strict type, which is a lot of code, it will cause values like zero to false one, or any other integers to true. It will coerce every other integer to true, like the true type follows the behaviour of false of being a value type. So it only accepts true, you would get into this weird distinction of does true or false, mean exactly true or false? Or do you get the same behaviour as using the boolean type? Derick Rethans 4:07 So I would say that true or false would than be more restrictive than bool. George Peter Banyard 4:12 Exactly, which is a bit of a problem, because PHP internally has true and false and separate types, which also makes the implementation of this RFC extremely easy, because PHP already makes the distinction of them. But at the same time, the boolean type is just a union of the bitmask of true and false. You can't really distinguish between the types, true or false, or the boolean type within the type system. Currently just does it by checking if it only has one then it can do like two checks. Specifically, you would need to add like an extra flag. I mean, it's doable, but it's just like, Well, who knows which semantics we want? Therefore, just leave it for future discussion because I'm not very keen on it to be fair. Derick Rethans 4:55 True or false are really only useful for return values and not so much for arguments types, because if you have an argument that that always must be true, then it's kind of pointless to have of course. George Peter Banyard 5:05 Same as like it was with the null type RFC. Although there might be one case where PHP internal functions might change the value to true for an argument, I can maybe two types, would be like with the define function, this thing being like case insensitive or case sensitive, I don't remember what the parameter actually; could actually either be false or true, because at the moment, I think emits a notice, things do like the this thing is not supported, therefore the values what was ignored. But we could conceivably see that in PHP 9, we would actually implement this as a proper like: Okay, this only accepts true, yes, this argument is pointless, but it's in the middle of the function signature, so you can't really move it. The spl_register_overload function has like as its second argument, the throw on error or not, which since PHP 8 only accepts true, but it's in the middle of the function. The last argument is still very useful. It's prepend, instead of append the autoloader, I think, or might be the other way around, check the docs. Since PHP 8, this only accepts true. So if you pass in false, it will emit a notice and saying you'd like this argument has just been ignored. So whatever. But we can't really remove the argument. Because well, it's, if you use the third argument, as with positional arguments, then you would change like the signature and you would break it. Now, we don't have a way to enforce in PHP to use named arguments, because that would be a solution. It's just like, well, if you want to set this argument, you need to use named arguments, but we can't do that. Otherwise, then creating a new function, which has an alias, which is also kind of terrible. That would be one of the maybe only cases where you would actually get like true as a as an argument Derick Rethans 6:39 is that now currently bool? And there's a specific check for it? George Peter Banyard 6:42 It's currently bool, and if you pass in false enrolment, like a warning, or notice. Derick Rethans 6:47 How would inheritance work? As return types, you can always make them smaller, right? More restrictive. George Peter Banyard 6:53 Yes, that's also the thing. But that already exists in some sense a problem of. Like if you go from boolean to false, you're already restricting the type. And that problem existed, even before the restricting, well allowing false as a stand-alone type if you had like, as a union, because you could always say like, I don't know. That problem already existed with Union types. Because you could have something like overturn an array or bool and then you change it to either an array or false. And then if you try to return like zero, then you will get like a coercion problem. So the same problem applies with true, because it only affects return values. And like you control the code within a function compared to like how you pass it, that's less of an issue. It applies also, with argument types where you can go from true to like boolean, or true and like a union type. Derick Rethans 7:44 So there's nothing surprising here. I see that the RFC also talks a little bit about future scope. Can you tell a bit more about that? George Peter Banyard 7:53 True and false are part of what are called value types, they are a specific value within the type. One possible future scope would be to expand value types to all possible types. So that you could say, oh, this function returns one, two or three. Derick Rethans 8:08 Would you not rather use an enum for that? George Peter Banyard 8:09 Exactly. That's the point I was going to make is that enums serve this purpose, in my opinion. And as a type purist, ideally, I would have preferred that we didn't have to enforce because the code, it kind of goes against the grain in this sense. Derick Rethans 8:23 We've had it for 25 years, booleans. George Peter Banyard 8:26 Yes, right. But boolean is its own type, in some sense, which you could say is a special enum. Enums are types. But we have false, and not having true is just so weird to me. It's like, oh, you've got this thing, but you don't have this other thing. And there are loads of cases where functions return true, or due to legacy reasons and to preserve BC, and PHP 8 promoted a bunch of warnings to to error. So now you've got functions which used to return false, don't return false any more. And they only return true. Now, some of the famous examples are probably like array_sort of, like actually, the sorting array functions, return true for basically all of PHP 7. I think there was something changed in PHP 7, probably was the engine or something like that, that they stopped returning false, which is strange. And I've made the discovery somewhat recently, I'm like, this is so pointless, because you see loads of loads of code checking like that the return value of the sort function is correct. Derick Rethans 9:20 It's also that most of the sort functions actually sort by reference instead of returning the sorted array, which I can understand as a performance reason to do but... George Peter Banyard 9:29 it's not very functional. You modify stuff in place and like passing it around. And because yeah, I think the initial thing was that like, well do it would return a false or true because sometimes it could, the sort could fail. Derick Rethans 9:42 I don't understand how a sort could failure, but there we go. George Peter Banyard 9:46 I mean, I suppose if you have like incomparable values within the array like that somewhat logical, I suppose. Derick Rethans 9:53 Was there anything else in future scope? George Peter Banyard 9:56 One of the future scope, I feel was everything type related. It's like type aliases, because when you start making more complicated types, having a way to type alias, it is probably nice. Don't think we'll get this for PHP 8.2. I don't think we any of us had the time to work on it. Derick Rethans 10:11 Well, we only have a month left anyway. George Peter Banyard 10:13 Yeah. And I mean, I'll probably be back on here. I'm trying to get DNF types working, but... Derick Rethans 10:19 Can you explain that these are? George Peter Banyard 10:20 Disjoint normal form types? Derick Rethans 10:22 That did not help. George Peter Banyard 10:24 But it's the being able to combine union types with intersection types together, Derick Rethans 10:28 I can understand that doing that is kind of complicated. You also need to sort of come up with a with a language to define them almost right? I mean, you then get the argument, are you going to require a parenthesis around things? George Peter Banyard 10:38 I'm requiring parentheses. People have told me the argument of like: Yeah, but in maths like and takes priority, it's just like, have you seen mathematicians, mathematicians don't agree on notation, and it's terrible, or they call stuff and the different they call it something is like, oh, sometimes a ring is commutative, and sometimes it's not. Don't follow mathematicians, don't follow mathematician, Derick Rethans 10:57 Type aliases is something that would only apply to single files. See, that's what you're suggesting. And then there's exported type definitions, which I guess could be autoloaded at some point; would be nice to have, I guess. George Peter Banyard 11:09 I think that's the trouble just like defining the semantics. Type aliases within a file are nice, but they're not very useful. Most of the time, you would want to export the type. For example, if you say: Oh, I accept, I don't know, something which looks like an array, which is like an array and like Traversable, and ArrayAccess or something. I'm sure, it's nice to have it in your own file. But like, if you use it around a project, and you need to redefine the type, every single file kind of defeats the purpose. Derick Rethans 11:35 It's kind of tricky to do with type definitions, because you sort of need to make sure that there are available and maybe can be autoloaded, just like classes can be right. And that makes things tricky. Because having a type definition and just three lines in a file, is kind of annoying, but I guess that is sort of necessary to do the kind of thing in a PHP ish way. George Peter Banyard 11:55 Yeah, we talked about it with Ilija because he he was on about it. And I was like: Well, ideally, you would want the separate autoload of types. That's how I initially conceived it, it's like having a different autoloading for types. But then the problem is, is like if anytime you hit a class, like in an argument, if you autoload the type first, it will go through all of the type definitions. And if, okay, at the moment, that wouldn't be there wouldn't be much. But if you go into like importing 30 composer projects, or libraries, which are define their own types, it will go through all of those first, before going to the classes autoloaded, and trying to find it then, which is not ideal. Yeah, it's going to be a tricky problem. It's either you merge these symbols together. But then the class table is not always a class. And sometimes you can't do new type. Like I said, tricky problems. Derick Rethans 12:43 Yeah, that's a tricky problem, but an interesting one. George Peter Banyard 12:47 Yeah. Derick Rethans 12:47 So that's future scope then. George Peter Banyard 12:50 Exactly. That is future scope. Derick Rethans 12:52 Do you have anything else to add? George Peter Banyard 12:54 Um, no, not really. I think I've said all I have to say it's pretty straightforward. Should be uncontroversial, hopefully. Derick Rethans 13:02 It currently looks like it's 20 for, and zero again. So I guess it will pass. George Peter Banyard 13:07 Brilliant. Derick Rethans 13:08 Who said that, that if your RFC ends up passing unanimously, it is too boring? George Peter Banyard 13:13 Nikita. Derick Rethans 13:14 Which is not incorrect. George Peter Banyard 13:16 It is not incorrect. But I mean, at the beginning, because I was like: Well, this is pretty straightforward. So I wrote the RFC, it was tiny. And I put it on to the list and people was like: Yeah, but what's the motivation for? I understand for adding false, because they already exist. But what's the motivation for adding a new type, and I was like, I now need to go back to the drawing board and write more. To be fair, that was a smart, because I then discovered the whole issue about true and false. That false is just a value type and doesn't do coercions. And it's like, okay, how do you handle the semantics and everything? Derick Rethans 13:46 I'm glad to hear it. Then all I have to say thank you for taking the time today to talk about this new RFC. George Peter Banyard 13:53 Thank you for having me as usual. Derick Rethans 13:59 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@php internals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Add True Type RFC: Allow Null and False as Standalone Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 101: More Partially Supported Callable Deprecations
PHP Internals News: Episode 101: More Partially Supported Callable Deprecations London, UK Thursday, May 19th 2022, 09:05 BST In this episode of "PHP Internals News" I talk with Juliette Reinders Folmer (Website, Twitter, GitHub) about the "More Partially Supported Callable Deprecations" RFC that she has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 101. Today I'm talking with Juliette Reinders Folmer, about the Expand Deprecation Notice Scope for Partially supported Callables RFC that she's proposing. That's quite a mouthful. I think you should shorten the title. Juliette, would you please introduce yourself? Juliette Reinders Folmer 0:37 You're starting with the hardest questions, because introducing myself is something I never know how to do. So let's just say I'm a PHP developer and I work in open source, nearly all the time. Derick Rethans 0:50 Mostly related to WordPress as far as I understand? Juliette Reinders Folmer 0:52 Nope, mostly related to actually CLI tools. Things like PHP Unit polyfills. Things like PHP Code Sniffer, PHP parallel Lint. I spend the majority of my time on CLI tools, and only a small portion of my time consulting on the things for WordPress, like keeping that cross version compatible. Derick Rethans 1:12 All right, very well. I actually did not know that. So I learned something new already. Juliette Reinders Folmer 1:16 Yeah, but it's nice. You give me the chance now to correct that image. Because I notice a lot of people see me in within the PHP world as the voice of WordPress and vice versa, by the way in WordPress world to see me as far as PHP. And in reality, I do completely different things. There is a perception bias there somewhere and which has slipped in. Derick Rethans 1:38 It's good to clear that up then. Juliette Reinders Folmer 1:39 Yeah, thank you. Derick Rethans 1:40 Let's have a chat about the RFC itself then. What is the problem that is RFC is trying to solve? Juliette Reinders Folmer 1:46 There was an RFC or 8.2 which has already been approved in October, which deprecates partially supported callables. Now for those people listening who do not know enough about that RFC, partially supported callables are callables which you can call via a function like call_user_func that which you can't assign to variable and then call as a variable. Sometimes you can call them just by using the syntax which you used for defining the callable, so not as variable but as the actual literal. Derick Rethans 2:20 And as an example here, that is, for example, static colon colon function name, for example. Juliette Reinders Folmer 2:26 Absolutely, yeah. Derick Rethans 2:27 Which you can use with call_user_func by having two array elements. You can call it with literal syntax, but you can't assign it to a variable and then call it. Do I get that, right? Juliette Reinders Folmer 2:36 Absolutely. That's it. There's eight of those. And basically, the original RFC from Nikita proposed to deprecate support for them in 8.2, add deprecation notices and remove support for them altogether in PHP nine. And the original RFC explicitly excluded two particular things from those deprecation notices. That's the callable type and using the syntaxes in combination with the is_callable function, where you're checking if the syntax is callable. The argument used in the original RFC was to keep those side effect free. The problem with this is that with the callable type, this means you go from absolutely no notice or nothing, to a fatal error in PHP 9. Everything works, and you're not getting any notification. But in PHP 9, its fatal error at the moment that callable is being passed to a function. Derick Rethans 3:31 This is the callable type in function declarations. Juliette Reinders Folmer 3:33 Yeah, absolutely. And with is_callable, I discovered a pattern in my wanderings across the world where people use the syntax in is_callable, but then use it in a literal call. So not using call_user_func, not using a variable to call it, but it's callable static double colon method name, and then called static double colon method name as literal. And that pattern basically, for valid calls would mean that that function would no longer be called in PHP 9 without any notification whatsoever. Derick Rethans 4:13 So it's a silent change that you can't detect at all. Juliette Reinders Folmer 4:17 Yeah, which to me sounded dangerous. I started asking some questions about that. But six weeks ago, the conclusion was, well, maybe this should be changed. But as this was explicit in the original RFC, we can't just change it. We need to have a new RFC to basically amend the original RFC and remove the exception for these two situations and allow them to throw deprecation notices. Derick Rethans 4:44 What are you proposing to change with this RFC than? Juliette Reinders Folmer 4:47 What this RFC is proposing is simply to remove the exception that the callable type and is_callable are not throwing a deprecation notice. This RFC is proposing that they should throw a deprecation notice, so that more of these type situations can be discovered in time for PHP 9 to prevent users getting fatal errors. Derick Rethans 5:08 Now, of course, we have no idea when PHP nine is actually showing up, but I don't think it will be this year. Well, I know it won't be this year, and it certainly won't be be next year neither, I think. Juliette Reinders Folmer 5:17 That's all the same. I mean, it makes there'll be two, three years ahead, but it doesn't really make sense to have the main deprecation in 8.2 and then have the additional deprecation in 8.4 or something. Derick Rethans 5:29 Absolutely. Juliette Reinders Folmer 5:30 It's a lot more logical to have it all in in the same version. Because it's all related. It's basically the same thing without the exception for callable type. And is_callable. Derick Rethans 5:42 Although there is no current application, would this be able to be found if you had like a comprehensive test suite? Juliette Reinders Folmer 5:48 Yes and no. Yes, you can find this with a test suite. But one, you're presuming that there are tests. Two, that the tests covered the effected code with enough path coverage. Three, imagine a test you've written yourself at some point in the past where which affected callables, you might have, you know, a data provider where you say: Okay, valid callable function, which you've mocked or, you know, closure, which you've put in and second, this function does not exist. Okay, so now you're testing this function, which at some point in its logic has a callable, and expects that type to receive that type. But are you actually testing with the specific deprecated partially supported callables? Even if you have a test, and the test covers the affected code, if you do not test with one of these eight syntaxes, which has been deprecated, you still cannot detect it. And then, four, you still need to make sure that the tests are routinely run, and in open source, that's generally not a problem. Most open source projects, use GitHub actions by now to run the tests automatically on every pull request, etc. But, have the tests been turned on to actually run against PHP 8.2. Are the tests run against pull requests? I mean, there are still plenty of projects, which don't do that kind of thing. Yes, you can detect it with a good test suite. But there's a lot of caveats when you will not detect it. And more importantly, you will not be able to detect it until PHP 9. Derick Rethans 7:23 Yes, when your code and stops behaving as you were expecting it to be. Juliette Reinders Folmer 7:28 Yeah, because in 8.2, you're gonna get deprecation notices for everything else, but these two situations. But not in 8.2, not in 8.3, not in 8.4, and then whatever eights we're gonna get until nine, you will not be able to detect without deprecation notices, until PHP 9 actually removes support for these partials deprecated callables. Yes, but no. Derick Rethans 7:53 We already touched a little bit on how you found out for the need for this RFC or for changing behaviours. But as people have stated in the past, adding deprecation notices is a BC break. That's a subject that we will leave for some other time because I don't necessarily believe that. But would, the changes in your RFC not add more backwards compatibility issues? Juliette Reinders Folmer 8:14 The plain and simple the backward compatibility break is in the original RFC. That's where the deprecation is happening. This RFC just makes it clearer where the BC break is going to be in PHP 9. It's not PHP 8.2, which has a backward compatibility break. It's PHP 9 which will have to backward compatibility break. Yes, I've heard all those arguments, people saying deprecation notes are BC break, no they're not. But they are annoying. And Action List, to for everything that needs to be fixed before 9. Given big enough projects, you cannot say: Okay, I'm gonna do this at the last moment, just before 9 comes out. It literally means 10 months of the year I for one am working on getting rid of deprecation notices in project to prepare them all to be ready for PHP 9 when PHP 9 comes round. Derick Rethans 9:06 But it's still better to have them than to not,.and then you code starts breaking right? Because that is exactly why you're proposing this RFC as far as I understand. Juliette Reinders Folmer 9:16 Yes, absolutely. I mean, I'm always very grateful for deprecation notices, but it would be nice if we had fewer changes, which would cost them, for a year or two, so I can actually catch my breath again. Derick Rethans 9:28 I think PHP 8.2 will have fewer of these changes in there. There will still be some of course. Juliette Reinders Folmer 9:35 Well, I mean, this one is one deprecation. And then we have the deprecated Dynamic Properties and that one is already giving me headaches before I can actually start changing it in a lot of projects. I'm not joking, that one really is going to cause a shitload of problems. Derick Rethans 9:51 It's definitely for products have been going on for so long, where dynamic properties are used all over the place. And I see that in my own code as well. I just noticed this morning does actually breaks Xdebug. Juliette Reinders Folmer 10:03 I know it's currently breaking mockery, we're gonna have to have a discussion how to fix that or whether or not to fix it. If Mockery is broken, that means all your tests are broken. So the test tooling needs to be fixed first. Derick Rethans 10:18 That's always the case, if you work with CLI tools that make people run code on newer PHP versions, that's always a group of tools that needs to be upgraded first, which is your sniffers, your static analysis, your debugger still will always need to go first. Juliette Reinders Folmer 10:27 Which is why I look at things a lot earlier, probably then the majority of people. I mean, I see him huge difference between the open source and closed source community. For open source, I started looking at it well, I've been looking at 8.2 since the beginning. And I started running tests for all the CLI tools. As soon as 8.1 comes out, 8.2 gets added to the matrix for running in continuous integration. And then for applications, it gets added like in you know, once alpha 1-2-3 has come out. For the extensions, it gets added in September once the first RFC gets added. And all of them are trying to get ready before the release of 8.1 or 8.2 in this case, because you do not know as an open source maintainer, what version people are going to run your code on. And you can say IP, you can manage that via Composer, no you can't. Sorry, you can only do that if your users are actually installing via Composer. If your users are downloading a zip file, and uploading it to a web host via FTP, there's literally no way you can control whether they're running on 8.0, or 8.1, except for maybe during check: You cannot run on 8.1 yet. Derick Rethans 11:52 Upgrading software with version support is an issue that's been going on for 40 years and will go on for at least another 40 more. This is not a problem that we can solve easily. Juliette Reinders Folmer 12:03 But what I see there is like the closed source community is like, oh, yeah, but you know, by the time I want to upgrade my server to 8.1, or 8.2, I just run Rector and all will be fine. And I'm like, yeah, sorry, that does not work for open source. We need cross version compatible with multiple versions. And I try to keep that range of version small for the project, I initiate, I don't always have control over it. If for instance, one of the projects I maintain is Requests. And that's a project which does HTTP requests. It's used by WordPress, it cannot be let go of the minimum of 5, PHP 5.6, until WordPress, lets go of that. Derick Rethans 12:48 Well, the alternative is that WordPress uses an older version until it can let go of it. Juliette Reinders Folmer 12:54 Yeah, the only problem then is that we don't want to maintain multiple stable branches. For security fixes. Derick Rethans 13:03 For Xdebug, what I do is I support what the PHP project support when a PHP release comes out, which is a bit longer than PHP itself usually, but not by much more than a year or two. Juliette Reinders Folmer 13:15 I understand that. And I mean, I applaud Sebastian for at some point, having the guts to say to the community, I'm limiting the amount of versions I'm supporting. And I'm sticking to the officially supported PHP versions. That does not mean that that didn't give a large part of community which does need to support a wider range of PHP versions a problem. I fully support that people limit the amount of fish and stay support and like Sebastian, who I know got half the community up in arms against him when he said, I'm not going to support older PHP versions any more. It did create a problem and but the problem which I've tried to solve for instance with the PHP unit polyfills, which now is solvable by using the PHP Unit polyfills in quite a transparent way, which is helpful for everyone. It takes the complainers of Sebastian's back, and at the same time, it allows them to run the test. Derick Rethans 14:10 I think another good thing that Sebastian recently has done is make sure that deprecation notices are no longer failing your tests. Juliette Reinders Folmer 14:17 I don't agree. The thing is, I do understand him making that change. But changing that default from not showing those deprecation notices or not not allowing deprecation notes to fail the test, or not in a patched version, I don't think was the right thing to do. That should have been in a minor, let alone or maybe even in a major not in a 9.5.18 patch version. Also with the whole idea, I mean, again, this is very much an open source versus closed source discussion for closed source I completely understands that people say I don't want to know until I actually am ready to upgrade to that version. Derick Rethans 14:56 I understood it's more of a difference not necessarily between open and closed source, but rather between library maintainers and application maintainers. And the applications can then also be closed source. Juliette Reinders Folmer 15:06 The open source work I work in, I mean, I do want to see them. And the problem with the deprecation notices anyhow, and I've seen various experiments via Twitter fly past for the past year. Say you build something on top of something else, you want to see the deprecation notices and the errors which apply to your code. We don't want to see the ones which come from the framework on which you build on top. The silencing deprecation notices or not, allow tests to error out on deprecation and just not solve that problem. Derick Rethans 15:39 The only thing it does is make things a little bit less noisy so that fewer people complain to library authors isn't it? That's pretty much what it does. Juliette Reinders Folmer 15:48 The thing would I see what it has done is that people think the tests are passing. Derick Rethans 15:54 Well they are passing, but... Juliette Reinders Folmer 15:56 Yeah, but most people don't read change logs of PHP unit, especially as releases don't get actually have to change log included. When PHP Unit releases its actual release, it doesn't actually post a release on GitHub. So people who watch the PHP unit repo for releasing doesn't, don't get notifications, let alone a changelog. So they actually have to go into the repo to find out what has changed. Most people don't do that. They just get you know depend-a-bot update, which won't say much, because again, it doesn't have release information. Derick Rethans 16:28 It'd be nice, maybe if Composer ,when you upgrade packages, that it can show like the high level changes when you do an upgrade. The Debian project does that if you upgrade packages that have like either critical or behavioural changes, you actually get a log when you run the update. Juliette Reinders Folmer 16:43 And then the change should have been in major or minor, because in a patch release, you don't expect it kind of changes. I also know the struggle there. They've been going through to four PHP units and which is similar to what I'm struggling with with the amount of changes from PHP 8.0 and 8.1 which has to be deal dealt with. Projects are being delayed, we're having trouble keeping up as an open source community, we still need to look after our own mental health as well. Derick Rethans 17:10 What has the feedback been to far on the RFC or non? Juliette Reinders Folmer 17:13 The feedback on this particular RFC has been next to nothing. And that's not surprising. I mean, basically, the discussion has happened before. And I started the discussion six weeks ago, eight weeks ago, which led to this RFC. So far the responses, which I have seen, either on Twitter or in private or in our people will read through the RFC. They're like, yeah, it makes sense. Derick Rethans 17:37 I think this is quite a nicer way of getting RFCs done, you discuss them first. And if there's then found a need actually spend a time on writing an RFC. In other cases, the other way around happens, right? People write a long, complicated RFC, and then complain that nobody wants to talk about it. Juliette Reinders Folmer 17:53 When I started the previous discussion, it was I see this, I noticed this, was this discussed? And then I got back: yeah, nobody actually discussed the previous RFC and I'm like: Okay, so what's this whole point about under discussion if nobody's discussing? Derick Rethans 18:10 Well, you can't force people to talk, of course. Juliette Reinders Folmer 18:14 It does make me wonder, again, what we were talking about before, people who work in managed environments versus people who will have to support multiple PHP says, I sometimes wonder how many people who actually have voting rights work in those closed environments, and think, you know, upgrading is something you do with Rector. Now I have a feeling that often open source gets a little forgotten. Derick Rethans 18:38 Yeah, that's perhaps true. Thank you for taking the time this morning to talk about this RFC then. Juliette Reinders Folmer 18:44 Thank you Derick for having me. It was a pleasure to do you like always. Derick Rethans 18:49 Thanks. Derick Rethans 18:54 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Expand deprecation notice scope for partially supported callables Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 100: Sealed Classes
PHP Internals News: Episode 100: Sealed Classes London, UK Thursday, March 24th 2022, 09:04 GMT In this episode of "PHP Internals News" I talk with Saif Eddin Gmati (Website, Twitter, GitHub) about the "Sealed Classes" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 100. Today I'm talking with Saif Eddin Gmati about the sealed classes RFC that they're proposing. Saif, would you please introduce yourself? Saif Eddin Gmati 0:31 Hello, my name is Saif Eddin Gmati. I work as a Senior programmer at Les-Tilleuls.coop. I'm an open source enthusiast and contributor. Derick Rethans 0:39 Let's dive straight into this RFC. What is the problem that you're trying to solve with it? Saif Eddin Gmati 0:43 Sealed classes just like enums and tagged unions allow developers to define their data models in a way where invalid state becomes less likely. It also eliminates the need to handle unknown subtypes for a specific model, as using sealed classes to define models gives us an idea on what child types would be available at run time. Sealing also provides us with a way for restricting inheritance or the use of a specific trait. For example, if we look at logger trait from the PSR log package that could be sealed to logger interface. This way, we ensure that every use of this trait is coming from a logger not from any other class. Derick Rethans 1:24 I'm just reading through this RFC tomorrow, again, and something I didn't pick up on reading to it last time. It states that PHP already has sort of two sealed classes. Unknown Speaker 1:35 Yes, the throwable class in PHP can only be implemented by extending either error or exception. The same applies for DateTime interface, which can only be implemented by extending DateTime class or DateTime Immutable class. Derick Rethans 1:52 Because PHP itself doesn't allow you to implement either throwable or DateTimeInterface. I haven't quite realized that that these are also sealed classes really. What is sort of the motivation behind wanting to introduce sealed classes? Unknown Speaker 2:06 The main motivation for this feature comes from Hack the programming language. Hack contains a lot of interesting type concepts that I think personally, PHP could benefit from and sealed classes is one of those concepts. Derick Rethans 2:18 What kind of syntax are you proposing? Saif Eddin Gmati 2:21 The syntax I'm proposing actually there is three syntax options for the RFC currently, but the main syntax is inspired by both Hack and Java. It's more similar to the syntax used in Java as Hack uses attributes. Personally, I have been I guess, using attributes from the start as I personally see sealing and finalizing similar as both effects how inheritance work for a specific class. Having sealed implemented as an attribute while final uses a keyword brings more inconsistency into the language which is why I have decided not to include attributes as a syntax option. Derick Rethans 2:56 In my opinion, attributes shouldn't be used for any kind of syntax things. What they should be used for is attaching information to already existing things. And by using attributes again, to extend syntax, you sort of putting this syntax parsing in two different places , right? You're putting it both in the syntax as well as in attributes. I asked what the syntax is, but I don't think he actually mentioned what the syntax is. Saif Eddin Gmati 3:20 The syntax the main set next proposed for the RFC is using sealed and permit as keywords we first have the sealed modifier which is added in front of the class similar to how final or abstract modifiers are used. We also have the permit clause which is basically a list allows you to name a specific classes that are able to inherit from this specific type. Derick Rethans 3:43 So when you say type here, is that just interfaces and classes or something else as well? Saif Eddin Gmati 3:48 It's classes interfaces and traits. Traits are allowed to add sealing but they are not allowed to permit. Okay for example, an interface is not allowed to permit a trait because a trait cannot implement an interface Derick Rethans 4:03 In the language itself, when does this get enforced? Saif Eddin Gmati 4:06 This inheritance restriction gets enforced when loading a class. So let's say we are loading Class A currently if this class extends B, we check if B is sealed. And if it is we check if B allows A to extend it. But when loading a specific sealed class, nothing gets actually checked. We just take the permit clause classes and store them and move on. Derick Rethans 4:32 It only gets checks if you're trying to implement an interface. Saif Eddin Gmati 4:36 This gets enforced when trying to implement an interface, extend that class, or use it trait. Derick Rethans 4:41 Okay. What are general use cases for this feature? Saif Eddin Gmati 4:45 General use cases for a feature are for example, implementing programming concepts such as Option which is a type that can only have two subtypes. One is Some, other is None. Another concept is the Result where only two subtypes are possible, either success or failure. Another use case is to restrict inheritance. As I mentioned before, for example, logger trait from the PSR log package is a trait that implements some of the method methods in logger interface, and expects whoever is using that trait to implement the rest. However, there is no restriction by the language regarding this, we can seal this trait to a logger interface ensuring that only loggers are allowed use this trait. Derick Rethans 5:34 When you say that Option has like the value Some or None, just sound like an enum to me. How should I think differently about enums and sealed classes here? Saif Eddin Gmati 5:43 Enums cannot hold a dynamic value. You can have a value but you cannot have a dynamic value, however, tagged unions will allow you to implement option the same way. Tagged unions are that useful only for this specific case, there is some other cases such as the one I mentioned for traits that cannot actually be implemented using the tagged unions. There is also the I don't know how to say this. Let's say we have a type A that sealed and permitting only B and C. And this case A on itself, as long as it's not an abstract class, is by itself a type. Can be used as a normal class, you can create an instance and use it normally. However with tagged unions, the option itself would not be a type, you either have some or none. That's the main difference between tagged unions until classes Derick Rethans 6:37 A tagged union PHP doesn't have them. So how does a tagged union relate to enums? Saif Eddin Gmati 6:43 With tagged unions as the, there is an RFC that's still in draft, I suppose that uses actually it is built on top of enums that that's why. Derick Rethans 6:55 I reckon once that gets closer to completion, I'll end up talking to the author of that RFC. So something I'm wondering, can a sealed type permit only one other type? Or does it have to be more than one? Saif Eddin Gmati 7:10 No, it can permit only one type. Let's say we have class A that only permits B. However, another thing is class B does not actually have to extend A, like if A is permitting B, B does not actually have to implement A. It's still useful because another class called C can extend B and implement A, so an instance of A B can still exists. Derick Rethans 7:36 I'm not quite sure whether I understood that. If you have an interface that says A permits B, then B is not required to implement A, mostly because the moment you loads class B, you don't even know it exists, right? Because it doesn't refer to it. Saif Eddin Gmati 7:54 Yes. Derick Rethans 7:55 It's just going to break anything? Saif Eddin Gmati 7:57 Hopefully not. The only break would be in the new reserved keywords which are sealed and permits. So those cannot be used as identifiers any more, but depending on the syntax choice, if for example, the second syntax choice wins which that would only take the permits keyword. If the third syntax choice is chosen then no new reserved keywords will be introduced so there will be no breaks. Derick Rethans 8:29 From what I see in the RFC the first syntax is using both sealed in front of a as a marker and then using permits. With the second syntax, you don't use seal but you infer that it is sealed from the permits keyword I suppose. And then in the last option you use the for keyword instead of permits and also don't use sealed yet? Saif Eddin Gmati 8:51 The third syntax choice is will be the one with no breaks as we will not be introducing any new keywords; for is already a reserved keyword in PHP. Derick Rethans 9:02 What is your preference? Saif Eddin Gmati 9:03 Personally I prefer the first syntax choice as it's the most explicit. When you start reading the code you can tell from the start this is a sealed class without having to continue reading until you reach permits. Derick Rethans 9:15 I think I agree with you there. Beyond the syntax is there anything else that needs to be changed in PHP itself? Saif Eddin Gmati 9:22 The only other change that will be introduced in PHP is in reflection class. A new method called isSealed will be added to reflection method, which allow you to check if a class the class being reflected is sealed. Another method will be added called getPermittedClasses which returns the list of class names provided in the permits clause. Also a new constant should be added to reflection class that is is_sealed constant which exposes the bit flag used for sealed classes. Some changes will happen to the getModifiers method in reflection class. This method will return the bit flag is sealed set, if the class being reflected is sealed. The getModifierNames method will also return the string sealed if the bit is set, that should be about it. Derick Rethans 10:12 Basically everything that you need in reflection to find out whether it's a sealed class and other permits. Saif Eddin Gmati 10:18 Yes. Derick Rethans 10:20 See, I see the name of getPermittedClasses has to use, has the word classes in it. Does that mean that the types after permits have to be classes? Saif Eddin Gmati 10:32 No, they can be either classes or interfaces. But PHP refers to both classes and interfaces as classes in the reflection. So we have a reflection class, but that's actually a reflection trait class interface. And basically everything is class-ish. Derick Rethans 10:47 Class-ish. I like that. Did you look at some other alternatives to implementing the same feature or just the three syntax choices that you came up with? Saif Eddin Gmati 10:56 I did not consider any other alternatives precisely as the alternatives might be type aliases, tagged enums, package visibility. But I think each of these RFCs focused on a specific problem and expanding that area, while sealed classes focuses on all the problems mentioned on in this RFC tries to solve them in a minimal way. But only in relation to inheritance in classes, interfaces, and traits. Derick Rethans 11:24 Keeping it short and sweet. What has the feedback been so far? Saif Eddin Gmati 11:29 The feedback has been pretty mixed. Some people are against adding more restriction to types and inheritance. But in my opinion, this is not about adding restriction, but rather providing the user with the ability to add restrictions. And we already have final classes, which a lot of people seem to dislike. Derick Rethans 11:48 I don't understand why. But fair enough. Saif Eddin Gmati 11:51 I have created a community poll a couple of weeks ago to gather feedback on Twitter. The results were 60% for with over 150 participants. Another poll was created by Peter on Facebook ended with 54 of people voting yes. However, such polls that do vary depending on the audience. So it can be really an accurate representation of the PHP community. Derick Rethans 12:15 Polls on Twitter are never scientific, or they? I see that the RFC is in voting already. So for people listening to this, and if you have voting rights, then you have until when exactly? Saif Eddin Gmati 12:28 Until the end of the month. Derick Rethans 12:30 March 31. It says yes. Okay. Well, thank you very much for taking the time today Saif about sealed classes. Saif Eddin Gmati 12:37 Thank you for having me. Hopefully, I get to be here another time in the future. Derick Rethans 12:42 I hope so too. Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Sealed Classes Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 99: Allow Null and False as Standalone Types
PHP Internals News: Episode 99: Allow Null and False as Standalone Types London, UK Thursday, March 10th 2022, 09:04 GMT In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Allow Null and False as Standalone Types" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 99. Today I'm talking with George Peter Banyard, about the Allow null and false at standalone types RFC that he's proposing. Hello, George Peter, would you please introduce yourself? George Peter Banyard 0:36 Hello, my name is George Peter Banyard. I work on the PHP language, and I'm an Imperial student in maths in my free time. Derick Rethans 0:44 Are you're trying to say you're a student in your free time or contribute to PHP in your free time? George Peter Banyard 0:49 I feel like at this time, it's like, both are true at the same time. Derick Rethans 0:53 Let's hop into this RFC. It is titled allow null and false as standalone types. What is the problem that it is trying to solve? George Peter Banyard 1:02 This is the second iteration of this RFC. So the first one was to just allow null initially, and null is the unit type In type theory parlance of PHP, ie the type which only has one value. So null is a type and a value. And the main issue is that when for leads more with like inhabitants, and like the Liskov substitution principle. If you have like a method, like the parent method, which can be told like either null or an object, and your implementation in a child class always returns null, for various reasons, maybe because it doesn't support this feature, or whatever is out, or... If your child method only returns null, currently, you can't document, that you can't type this properly, you can document it in a doc comment or something like that. But due to how PHP type handling works, you need to specify at least like another type with null in the union. Basically resort to always saying like mimicking the parent signature, when you could be more specific. This was the main use case I initially went into. Derick Rethans 2:08 If I understand correctly, you can't just have an inherited method that has hinted as to just return null? George Peter Banyard 2:14 Exactly. If you always return null, maybe because you always work or something like that, then you must still declare the return type as like null or exception, which is not a concrete because you say what, like why never fail. And like static analysers, if they can figure it out that you're using a child class, they can't maybe like do some assumptions or work further down that like what you're doing is redundant or things like that. So that's one of the main reasons I initially went with it. And I didn't add false initially, because it was like, well, false, it's not really a type properly. It's, it's what's called a value type. False is one value from the Boolean type. And I was like, Well, okay, we're just going to limit it to like, being the type theory purist, limited to proper types, where null is a proper type, although it's a bit sometimes misunderstood, I feel in the PHP community at large. And then people were like, well, if we add null, then by the only type-ish thing, which you can use in a type declaration, or whatever, which can't be used in a return type on its own, is false. And it's just weird. So why not add it in full. So that was the second thing as to why I added it. Some of PHP internal's functions being terribly designed because they were designed back in the early noughties, return null on success and false on failure, which you can't probably type at the moment. Currently, we need to type them as like Boolean or null, but true can never be returned in this case. And there are some other some other people have reached out to me it's like, well, yeah, but I always return false in this case. Or I also return always true in this case, although true, we have this weird asymmetry that we have false as a value type and not true. Derick Rethans 3:49 What was the reason for having false but not true? George Peter Banyard 3:53 When the union type RFC got discussed and passed for PHP 8.0, false was added, because a lot of traditional behaviour of PHP internal functions, was to return false on failure, instead of the technically more correct thing would be to return null. Because loads of functions return a false on failure, and saying that like in returns, these types, or a Boolean would be basically lying because you could never have true, false was included in it. With the restrictions that you can only use false as the complement with other types. So you need to do for example, array, or false, you couldn't just use false. Derick Rethans 4:37 Would it also mean that you can define a return type of a method that inherited a method that returns a bool, as false? George Peter Banyard 4:48 Yes, that would be now possible with the amended proposal. Yeah, which goes back to this weird a symmetry, we're probably. Adding true to make a complete would be a future RFC to do. Derick Rethans 5:00 Now, we've talked about return types. But I guess the reverse applies to arguments? George Peter Banyard 5:06 Arguments and property types also would, would be allowed to, like declare themselves as like null or false. The usefulness here is way more limited. Because if you declare an argument to be of type null, then basically you can only ever pass a null to it. And then therefore, the type doesn't do anything. Derick Rethans 5:26 But in an inherited method, you could then widen it. George Peter Banyard 5:31 Yes, exactly. You could always say: Well, this argument exists, it's always null. If you extend like your class or message, then you can add other types. But in theory, you can already do that by adding like an argument at the end of the message, because that's LSP compliant. The case for, and properties of those, because they are typing, they're in like their beads. Kind of debatable why you would do that. But it's just that like, well, if you accept types at one point, just restricting them like somewhere else gets very weird. At this point is more like look at the human review, or like use static analysis for the analyser to tell you like this argument is redundant and just remove it or this property doesn't make any sense. Because if it can only ever be null, why does it even exist in the first place? Derick Rethans 6:13 Right now, you can already use false in union types, but why not with null or false? George Peter Banyard 6:19 That goes back to the when a union type RFC got introduced. Null got added as a keyword. Before you could only use the question mark, before a type to make the type nullable. If you have a more complex union type, to not use the question mark in front of it. Therefore, the null keyword got added as a proper type. And because the logic was, Well, you shouldn't ever be able to return just null. Because then that function is kind of equivalent to void. Because of that, it was said that like, Well, okay, null and false basically have like kind of the same status is that like, if you just want to use null on its own, you're doing something kind of weird. And if you're returning more than false, like that signature is very strange. I think when that was discussed, nobody knew initially that an actual PHP function within one of the extensions, like in core had such a weird signature. Which mainly, we just started discovering that after this got, like accepted and we could like actually start properly typing the internal functions, and then you discover these weird edge cases where sounds like, that's a bit strange, can't properly document it. We just need to make like a note on the PHP documentation side. And like the type signature kind of lies to you. PHP's type hierarchy is a bit strange, void kind of lives on its own. So if the function is marked as void, it must always like any child inheritance, or whatever needs to be void. And when you type return in the function body, you need to always use return with like a semicolon afterwards, you can't even return null. Although, under the hood, PHP will always return a value when you call a function, even if the function is void, which will be null. Derick Rethans 7:58 The RFC also talks about question mark null, what is that supposed to be? Is that null or null? George Peter Banyard 8:03 PHP has this limited type redundancy checks at compile time. It will basically check if you're duplicating types. So if you write for example, int or int, even if it's capitalized differently, PHP was like, okay, just specifying twice the same type in this union. This is redundant. And then it will throw a compile error, we're basically saying, maybe you're just doing a mistake, maybe you meant something else. In the same vein, basically the question mark, gets like, translated into like, any seeing pipe null. And so if you write null with a question mark in front of it, it's just saying like, well, you're doing null or null, which is basically redundant. Therefore, you'll get like a compile time error telling you like. Derick Rethans 8:41 That seems sensible to me. What's been the feedback so far? George Peter Banyard 8:45 The most feedback, I think I've got it when I first proposed it in October. And at the time, it was like, Yeah, this is useful. And it's kind of needed because well, having always more type expressiveness is I think, good in general. But the main feedback at the time was like, Well, why not include false? The other feedback I got was basically, well, for consistency, what shouldn't you also add true? Yes, I do agree with this. I frankly, find it very strange that we landed in this situation where we only have one of these value types, either true and false, or none of them would make more sense to me. But that's expanding the scope. And it's kind of not this is not really concerned with this specific detail. Probably next, another RFC to do, for either myself or somebody else. It's just like propose true as a value type. Derick Rethans 9:33 Is the implementation of this RFC complicated? George Peter Banyard 9:36 It's very simple. It basically removes checks, because currently in the compile step, which checks for like type signatures, it needs to check that like, Well, are you declaring false or are you declaring null, and these checks get removed, so it makes the code a bit more streamlined. Oh, there's one change in reflection. For backwards compatibility reason, because of the fact of the question mark, any union type which is composed of a only two types, where one of them is null,will get converted in reflection to use the question mark notation, which is kind of a bit weird because it then gets converted into like a name type instead of a union type in reflection. But that's there, because of backwards compatibility reasons. I am breaking this into the more sensible reflection type. If you have a type of null and false, then you'll get a reflection union type instead of a named. From my understanding from reading the reflection code, the intention was always probably in PHP 9, to remove this distinction. So if you get a named type, it's only a single type instead of a possible nullable type. And any nullable types get converted into like a reflection union type when you have like null and the other type. Maybe this is a good test case to see if your code breaks. Derick Rethans 10:50 I would probably call that a BC break though. George Peter Banyard 10:53 This only happens if you do false union null. You can't use false currently on its own. And I think like, if you get false, as a named argument type, with like a question mark in front of it. Because it would be completely new, and you would never deal with it. It's like, well, this can also break because false can be in the Union type. If your library or the tool supports union types with the reflection thing, it will automatically know how to deal with false because it needs to know how to deal with it. And null. So that was kind of also the logic. It's like, well, okay, like if the tool supports that, which it needs to, then if you put this case into that bracket, it will work. It makes kind of the reflection code a bit more complicated at the moment. The whole fact that we need to juggle between like figuring out like, should we use the old like the backwards compatible thing reflection of like using a name type instead of the Union type, if there's a null and depending on the type, makes a reflection code unwindy and everything. And we have like a special function in C, which is basically just like, okay, which object do I need to create, depending on this type signature? Derick Rethans 11:53 When do you think you'll be putting this up for a vote? George Peter Banyard 11:56 I suppose I could put it up for vote immediately now. I am planning on maybe putting this on to vote at the end of the week or something like that. Derick Rethans 12:04 Well, thank you for taking the time today to talk about this RFC. George Peter Banyard 12:09 Thank you for having me on the podcast. Derick Rethans 12:13 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Allow Null and False as Standalone Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode
PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode London, UK Thursday, March 3rd 2022, 09:02 GMT In this episode of "PHP Internals News" I chat with Rowan Tommins (GitHub, Website, Twitter) about the "Deprecate and Remove utf8_encode and utf8_decode" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP Internals News, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 98. Today I'm talking with Rowan Tommins about the "Deprecate and remove UTF8_encode and UTF8_decode" RFC that he's proposing. Hi, Rowan, would you please introduce yourself? Rowan Tommins 0:38 Hi, I'm Rowan Tommins. I'm a PHP software architect by day and try and contribute back to the community and have been hanging around in the internals mailing list for about 10 years and contributed to make the language better, where I can. Derick Rethans 0:57 Excellent. Yeah, that's how I started out as well, many, many more years before that, to be honest. This RFC, what problem is this trying to solve? Rowan Tommins 1:08 PHP has these two functions, utf8_encode and utf8_decode, which, in themselves, they're not broken. They do what they are designed to do. But they are very frequently misunderstood. Mostly because of their name. And because Character Encodings in general, are not very well understood. People use them wrong, and end up getting in all sorts of pickles that are worse than if the functions weren't there in first place. Derick Rethans 1:37 What are you proposing with the RFC then? Rowan Tommins 1:39 Fundamentally, I'm proposing to remove the functions. As of PHP 8.2, there will be a deprecation notice whenever you use them, and then in 9.0, they would be gone forever, and you wouldn't be able to use them by mistake, because they just wouldn't be there. Derick Rethans 1:56 I reckon there's going to be a way to actually do what people originally intended to do with it at some point, right? Rowan Tommins 2:02 So yeah, there are alternatives to these functions, which are much clearer in what you're doing, and much more flexible in what you can do with them so that they cover the cases that these functions sound like they're going to do, but don't actually do when you understand what they're really doing. Derick Rethans 2:20 I think we'll get back to that a little bit later on. You're wanting to deprecate these functions. But what do these functions actually do? Rowan Tommins 2:27 What they actually do is convert between a character encoding called Latin-1, ISO 8859-1, and UTF-8. So utf8_encode converts from Latin-1 into UTF-8, utf8_decode does the opposite. And that's all they do. Their names make it sound like they're some kind of fix all the UTF 8 things in my text. But they are actually just these one very specific conversion, which is occasionally useful, but not clear from their names. Derick Rethans 3:01 It's certainly how I have seen it used in the past, where people just throw everything and the kitchen sink at it, and expecting it to be valid UTF 8, and then at the end, decode. I mean, the decoding was not even part much of this, right? It's just throw everything at it, and then magically it will all be UTF 8. But I reckon that's not really quite the case. When and how does that go wrong? Rowan Tommins 3:26 So what actually ends up happening is, because text doesn't know what encoding it's in. Something that people misunderstand about character encoding is they think it's like, the text is a certain colour, and the computer knows what colour it is. And if you tell the computer to make it a different colour, then it will work. But it's not like that. In the computer, there's just the sequence of binary. And the encoding is how to read that binary as text. And if you tell the computer to read it as Latin 1, it will read it as Latin 1. If you take to convert from Latin 1 to UTF 8, it will assume the input is Latin 1, it will convert to UTF 8 on that basis. If your text actually wasn't Latin 1 in the first place, you're just going to end up with garbage. And some of the worst cases of that is when you already have UTF 8, and then you run utf8_encode on it, because the language doesn't know that you've already got UTF 8, so it tries to read its Latin 1, write it out ass UTF 8 and you get this weird Mojibake. I don't know pronouncing that right. Derick Rethans 4:27 I think it's pronounced Mojibake. Rowan Tommins 4:30 Mojibake. Derick Rethans 4:31 It's a Japanese term, because clearly these things, these issues happened with Japanese text quite a lot because they have a lot more different and difficult characters and encodings as well. With which things often go wrong though? Rowan Tommins 4:44 Using an unco on text that's already UTF 8 is obviously a big one. Usually obvious, but occasionally people just getting a muddle with that. The other thing that often happens is confusing with similar encoding. Latin 1 is often mistaken for a different coding windows 1252. To the extent that web pages labelled as Latin 1, web browsers will assume that they're actually in Windows 1252. These PHP functions don't make that assumption. If your text is actually in Windows 1252, and it's been mislabelled Latin 1, you might still think you're doing the right thing. So I've got Latin 1 text, but you haven't. And then the characters that are different, are going to get mangled again. And there's a few other related encodings that often look the same. There are a few other encodings that look the same at a glance that again, will go wrong on any character that's different between the different encodings. Derick Rethans 5:43 How could a function tell which encoding a certain text was in? Rowan Tommins 5:49 It's tricky. There are libraries out there that try to do it. Some encodings that are sequences of bits that aren't a valid character. So if any of those appear, it's definitely not in that encoding. Unfortunately, a lot of encodings, every pattern of bits has a meaning. It's just not necessarily mean. So you can't look at the string and just tell at a glance. The only way I've seen that does it effectively, is trying to guess based on what language text it might be in. If your text suddenly has a load of symbols in the middle of sentences, you're probably using the wrong encoding. If it's suddenly got a load of capital letters, in the middle of words, you're probably using the wrong encoding. So you can make guesses like that, that ultimately, there are only ever guesses. Derick Rethans 6:38 It's only always going to be a guess, right? You can't really tell for certain what it it is, which I've seen people assume that she can just tell. We have concluded that utf8_encode and decode don't actually do what they say they don't magically encode everything to UTF 8. What if things go wrong? How are errors handled? Rowan Tommins 6:58 If you're converting from Latin 1 into UTF 8, there Latin 1 covers all 256 possible eight bit binary strings. Those will correspond directly to a single mapping in Unicode and therefore in UTF 8. So there are no errors as such, when that happens, but it might not be what you want. One of the most notable ones that's different between these encodings is Latin 1 was standardized in 1985, the Euro didn't exist, then. The euro symbol doesn't have an encoding in Latin 1. If you've got a euro sign, you haven't got Latin 1 text, but you might think you've got Latin 1 text, and it will just encode it to what to a control character, which is where the windows 1252 code page puts the euro symbol, it replaces some control characters in Latin 1. One of the reasons why these character encodings are so easily confused is they've all nicely built to being compatible on top of each other. Latin 1 is deliberately an extension of ASCII. Windows 1252 is deliberately an extension of Latin 1, replacing some control characters. UTF 8 is also based on Latin 1, the first section of Unicode is actually the Latin 1, characters UTF 8 will encode and slightly differently so that it can carry on above 256. So in that direction, you can't actually get an error, you could just get a string, that doesn't make sense. Going back the other way. Unicode has, I think, potentially 11 million or something, and actually, at least a million assigned code points. Latin 1 only has 256. So you can't map all those back. And this function, the utf8_decode just replaces any that it can't match with the question mark. Similarly, if the input string isn't valid UTF 8. Again, if you've just misunderstood what strings doing and you haven't actually got a UTF 8 string in the first place, any sequence that doesn't look like valid UTF 8, again, just gets replaced with a question mark. Completely silently you get no warnings in your logs or anything. So you'll just get a few question marks. And problem is, a lot of people are writing text, mostly in English. So it's mostly ASCII. And all of these encodings agree on those first 127 things including all the letters and digits, most of your text will look fine. But if you're using utf8_encode, some of the accented letters will just look a bit funny. If using utf8_decode some of the characters will just turn into question marks. And you might just not notice that for a while until your applications been in production. And now all your strings a messed up. Derick Rethans 9:48 And I reckon that there's no way to fix that? Rowan Tommins 9:52 No. If you've saved saved the text, particularly with the decode direction. Run utf8_encode wrong, if you're careful and tracked carefully where what you've used, you can retrace your steps back to the original string. But if you've not understood what it was doing in the first place, you might have run it more than once, or put it into a system and then re interpreted it in a different way. And it can sometimes be quite hard to trace back what the original string was. You'll sometimes just have to edit it by hand. And guess that, oh, that's probably any acute because that was the word that was trying to be there. That was probably a curly quote mark that somebody was trying to type and those kinds of things. Derick Rethans 10:35 Talking about curly quote marks, I just found out that those are actually are code points in the windows 1252 encoding. Because I just had to edit a document that had these things in there. But the file was set as... this is UTF 8, which was a lie. It was a lie to begin with. We've established that these functions are pretty much destructive to text potentially, as well as not really doing what they say they do: encode every random stuff to UTF 8 or the other way around. I saw any RFC that you've done some research into their usage, didn't bring up anything interesting to talk about? Rowan Tommins 11:13 Yes, so there's a few things. So what I downloaded, it was last year, actually, I kind of had to pause on this RFC for real life happened a bit to me. So last year, I downloaded the 1000, I think top packages on Packagist, I'm most popular downloads, and went through all the uses, I could say of these functions. There were a handful that were using them correctly, they were checking that their input was Latin 1, or the output they needed was Latin 1. And using these, there were a few of those that were questionable, where they might have mistaken Latin 1 for Windows 1252. And actually, they were going to mess up any Euro signs or any of those few extra things that Microsoft added over the top of those control characters. There were a few using strftime, which can do translated Date Time strings. Those it turns out that functions been deprecated itself now, that will become a non issue, some people will have to find a different solution to that anyway. One of the odder ones that I've seen, which technically works, but only accidentally is people use it for what I describe as armour, where they've got a system that wants UTF 8 text, often encoding as JSON or something like that, where it needs to be UTF 8, they've got some unknown encoding that's not UTF 8, they encode to UTF 8, transmitted through the system. And then on the other end, run utf8_decode and they'll get back the string that they put in, because it never errors, there will always be a mapping of any string of bits that this function will give in UTF 8, it just won't be a meaningful string. You could put a JPEG image through utf8_encode, and you will get a string that is valid UTF 8, it's just not going to be very useful UTF 8. It's kind of a bit of a weird way of doing the thing you might do with base 64, or quoted printable encoding or something like that almost something for transport, it technically works. But this probably isn't the function you want to be doing it with. It's not a very useful encoding. And then there were a good number, which just tried throwing all the functions they could. And I kind of I don't want to call out the people with this. I think they were genuine mistakes, they were genuinely trying to solve a problem. But some of them just in hindsight looking at them or kind of hilarious. I think the one that makes me laugh most is the person who raised the StackOverflow question because their CSV file, some of the fields had grown to 32 kilobytes long, because they'd repeatedly run the same string through utf8_encode so many times, that each time it was encoding a single byte to multiple bytes, and then single bytes of that to multiple bytes. And only when it got to 32 kilobytes in one field, did they question whether they were doing the right thing? By which time their text was probably irrevocably lost in whatever other processing they've done on this file. Derick Rethans 14:22 Excellent encryption. Rowan Tommins 14:24 Yes. Derick Rethans 14:25 The RFC talks about a few other approaches to instead of deprecating utf8_encode and decode. What are the things that you look at? And why did you reject them in the end? Rowan Tommins 14:36 One of the most obvious things you could do? The biggest problem is the name of the functions. Could you just rename them? The problem with that is you'd have to spend a long time doing it because you want to introduce the new name in one version of PHP, then deprecate in a later later version of PHP, and then finally remove. And then at the end of it, you'd have these very specific functions. We could call them latin1_to_utf8 and utf8_to_latin1. If we were designing those functions, if you put an RFC to, to add those functions to the language, it wouldn't pass. There's they're very why, why would we have these specific functions, and we'd still have this problem of Windows 1252, and other related encodings, like Latin 9, which is the official successor to Latin 1, and also has a few differences amongst it. They still wouldn't solve a lot of people's problems. A lot of the people that actually want Latin 1 are going to need the euro symbol. So they don't probably don't actually use Latin 1 any more. Because I guess Canadian French, and Mexican Spanish, need to probably that in one's probably still a decent encoding for but the Western European languages it was originally designed for, probably everyone's going to want a euro symbol. Changing the name just leaves us with these awkward functions still. You could instead or as well add options to them, you could add a parameter to them that indicated what the source or destination encoding was. That defaulted initially to Latin 1, and then you were forced to add it later. And then at least you'd be spelling out what encoding it was. The problem with that is, the more encodings, you add, there's actually quite a lot of code that would need to then be added to the function, and it will be duplicating functions we've already got. Derick Rethans 16:31 Such as? Rowan Tommins 16:32 So we've actually in PHP got three functions that can convert between any pair of encodings, including the ones that these functions do. They're all unfortunately in extensions, which are technically optional. Which is something that the way PHP is modular, means that a lot of things that you'd think were kind of just part of the language are technically optional, for one reason or another. But we've got mb_convert_encoding from the mbstring extension. We've got iconv, which uses an external library of the same name. Derick Rethans 17:09 Are you sure it just doesn't use a GCC function or the glib functionality in PHP? Rowan Tommins 17:14 The iconv function uses whatever iconv is available on the system, and seems to vary quite a lot between systems. Oddly, one online code running tool I tried, doesn't actually recognize 8859-1 as an encoding in the iconv function. I don't know why. Just something about the libraries, that version of PHP was built, built against. The most powerful one we've got but also the least documented is the intl extension, which is built on the ICU library, made by the Unicode Consortium. That has a lot of options around how you handle errors and missing characters and supports a lot of different character sets. Some was completely undocumented, I've tried to write a manual page for it, which will hopefully get merged and put live soon. So at least, there will be some documentation there's a, there's an object that you can use with lots of options. But there's a static method, which just takes a from and to encoding. So that's one option. The mb_convert_encoding is probably the most widely available. And maybe we should be looking at making that MB string, less optional. I don't know what that looks like, because of the way, unless you force people to compile it in a lot of the Linux distros. Distribute every module they can separately, they make optional. Derick Rethans 18:39 But they also make it easy for you to install them then. Rowan Tommins 18:42 They make it very easy to install. So I don't know how many people actually run PHP with just its minimal set of modules. And how many just install a default set. The default set is a bit vaguely defined, unfortunately. So that's one of the my main hesitation with this removal, that although we've got these alternatives, we've got these three alternatives. They've all got slight problems, and they're all optional. Derick Rethans 19:08 But considering that utf8_encode and decode don't actually really do well, they say they do, everybody that had to do character set conversions correctly, would have already been using these functions. Rowan Tommins 19:23 Indeed, yes. So I've seen people misuse all of these. Again, people do just generally misunderstand character encoding. MB string does have a function to guess character encoding. As you're saying earlier, people just kind of assume that that will work. A lot of the time, it can't really tell the difference between different character encodings. It can tell you whether a string is valid UTF 8, it can't tell you whether it's Latin 1 or Windows 1252, or any of these others that are single byte encodings. Derick Rethans 19:52 I think ICU actually as functionality for guessing an encoding as well, but it will give you back an array of possibilities and perhaps even with a confidence. But it's a long, long time since I've looked at that. So I'll have to revisit it. Rowan Tommins 20:08 Yeah, that would at least be a more kind of transparent way of doing it that. And that's I guess what I'm trying to do with removing these, is that if you're forced to specify a pair of encodings, as you do for these other functions, at least hopefully, somewhere in your mind, you're going to be thinking about what encodings you might have, rather than just reaching for the first function you find. Derick Rethans 20:31 Yep, exactly. What is the feedback being so far? Rowan Tommins 20:34 Generally positive. There hasn't been a lot of a lot of comments. But those that have been have generally been supportive. I liked somebody said: All the times they've seen it used, including when they've used it themselves, it's been a misunderstanding. I'd like to hear more feedback of anyone. Anyone does have quite. The main feedback I have had has been around making sure there are alternatives to recommend to people. So anyone who is using these correctly, or nearly correctly, what we tell them to use instead, how do we make sure that's clear, and clearly documented, and we're recommending the right thing. I'm going to think a bit more about that, whether we should be being more definite in recommending one of these options. Particularly I think iconv does seem to have these odd platform issues. They used to be a fourth option. While I was looking at this, they used to be another library called recode. That one seems to have been discontinued. Some references in the PHP manual still refer to recode as an optional option for doing this. But that's been long since shelved. So MB string has the benefit that it doesn't rely on any third party libraries. It's technically a third party library, but it's shipped with PHP, and I don't think anything other than PHP uses it any more. And there have been a lot of there's been a lot of work on that library recently, particularly somebody called Alex Douward, apologies, if you're listening to this, and I pronounce your surname wrong, has done a lot of great work. I've seen recently improving that extension, making sure the detection algorithm is doing as sensible results as it can and improving the test test coverage of that extension and things like that. So that gives me a bit more confidence in that extension, which initially was one of those PHP reinventing the wheel, it felt a bit like, so probably update the RFC to more explicitly say, that's the number one recommended path. Derick Rethans 22:27 And of course, you can link that from the utf8_encode and utf8_decode manual pages as well. Please don't use this instead, do this, right? Rowan Tommins 22:36 Yeah. And that's again, where it can be a nice clear drop in replacement, so that people are using it right. Here's exactly what to what to use instead. But hopefully, while they're replacing it, they may be at least think about whether it was doing what they what they were hoping for in the first place. Derick Rethans 22:55 When do you think you'll be bringing this up for a vote? Rowan Tommins 22:59 Unless I get more feedback, further changes? I'll probably tweak that wording in terms of the recommendation that we'll put to users. Otherwise, probably in the next couple of weeks, unless I hear any more, to see if any last minute criticism comes out the woodwork when people are asked to vote on it. Derick Rethans 23:18 Yeah that always happens, right? No comments when there isn't a request for comments. But loads of comments if people are voting on it, and it makes it to Twitter. Okay, Rowan, thank you for taking the time today then to talk about this RFC. Rowan Tommins 23:32 Thank you very much for having me. Derick Rethans 23:39 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Deprecate and Remove utf8_encode and utf8_decode Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 97: Redacting Parameters
PHP Internals News: Episode 97: Redacting Parameters London, UK Thursday, January 27th 2022, 09:09 GMT In this episode of "PHP Internals News" I chat with Tim Düsterhus (GitHub) about the "Redacting Parameters in Back Traces" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:00 Before we start with this episode, I want to apologize for the bad audio quality. Instead of using my nice mic I managed to use to one built into my computer. I hope you'll still enjoy the episode. Derick Rethans 0:30 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 97. Today I'm talking with Tim Düsterhus about Redacting Parameters in Backtraces RFC that he's proposing. Tim, would you please introduce yourself? Tim Düsterhus 0:50 Hi, Derick, thank you for inviting me. I am Tim Düsterhus, and I'm a developer at WoltLab. We are building a web application suite for you to build online communities. Derick Rethans 0:59 Thanks for coming on this morning. What is the problem that you're trying to solve with this RFC? Tim Düsterhus 1:05 If everything is going well, we don't need this RFC. But errors can and will happen and our application might encounter some exceptional situation, maybe some request to an external service fails. And so the application throws an error, this exception will bubble up a stack trace and either be caught, or go into a global exception handler. And then basically, in both cases, the exception will be logged into the error log. If it can be handled, we want to make the admin side aware of the issues so they can maybe fix their networking. If it is unable to be handled because of a programming error, we need to log it as well to fix the bug. In our case, we have the exception in the error log. And what happens next? In our case, we have many, many lay person administrators that run a community for their hobby, they're not really programmers with no technical expertise. And we also have a strong customers help customers environment. What do those customers do? They grab their error log and post it within our forums in public. Now in our forum, we have the error log with the full stack trace, including all sensitive values, maybe user passwords, if the Authentication Service failed, or something else, that should not really happen. In our case, it's lay person administrators. But I'm also seeing that experienced developers can make this mistake. I am triaging issues with an open source software written in C. And I've sometimes seeing system administrators posting their full core dump, including their TLS certificates there, and they don't really realize what they have just done. That's really an issue that affects laypersons, and professional administrators the same. In our case, our application attempts to strip those sensitive information from this backtrace. We have a custom exception handler that scans the full stack face, tries to match up class names and method names e.g. the PDO constructor to scrub the database password. And now recently, we have extended this stripping to also strip anything from parameters that are called password, secret, or something like that. That mostly works well. But in any case, this exception handler will miss sensitive information because it needs to basically guess what parameters are sensitive values and which don't. And also our exception handler grew very complex because to match up those parameters, it needs to use reflection. And any failures within the exception handler cannot really be recovered from, if the exception handler fails, you're out of luck. Derick Rethans 3:51 Quite a few things to think of to make sure that you're not sharing any secrets. And I certainly have seen almost doing this myself. We now know what the problem is. How is this RFC proposing to fix this? Tim Düsterhus 4:03 Primarily, we want to propose a standardized way for applications or libraries to indicate which parameters hold sensitive values. Our custom exception handler uses reflection as we said before, and it only matches up the parameter's names, but we also have this attribute I am proposing, SensitiveParameter within our application itself. Any parameter names that are not definitely sensitive can be attributed with this attribute. But this only works within our software, but not with any third party libraries we are using, e.g. for encryption or whatever there is. Primarily we want to propose a standardized way an attribute that is in PHP core, anyone can use that and everyone knows what this attribute means. Secondarily, the RFC is proposing a default implementation to keep the exception handler simple. As I said before, we are using reflection. This is very complex, it does not work with the require_once or include_once family, because that are not functions. We need to handle this case to not try to attempt to reflect on those non functions when redacting any parameters. This is complex. And we want to simplify that. Derick Rethans 5:20 From what I understand this is then a way to make sure that there's a standardized method for marking arguments as being sensitive. And because this is that now standardized, only one solution to the problem has to be found right? Tim Düsterhus 5:34 Basically, not every library is using their own attributes, possibly, or we can match parameter names that are not like password, secret, but it can be documented: hey, if you are using sensitive parameters, you should put this attribute and then those exception handlers will be aware that this attribute is sensitive and can strip it, or in case of the RFC PHP itself, will already strip those parameters from the stack trace. Derick Rethans 6:04 You're suggesting that PHP standard way of showing stack traces also takes care of the sensitive parameter here? Tim Düsterhus 6:11 Yes, exactly. Derick Rethans 6:13 Which internal PHP functions are likely to get this attribute? Tim Düsterhus 6:16 Basically anything with a parameter called password or secret, as I said before, examples include PDO's constructor, the database password will be in there and possibly also the user name or host name, which might be considered sensitive. But the password is the most important thing I have on my list. ldap_bind, which possibly includes user passwords; the password_hash function; possibly various OpenSSL functions. One will need to look and this list can be extended in the future as well, if someone realizes we missed anything. Derick Rethans 6:55 Now, I know sometimes that there's a problem where an application connects to the wrong server with PDO. And as you say, the host name was also in this PDO constructor, would it not then make debugging that specific case harder because the hostname would also be redacted from the stack traces? Tim Düsterhus 7:14 The attribute I am proposing as the parameter attribute, each parameter can be sensitive or non sensitive. We would need to decide whether we consider the hostname sensitive or not. It usually is not. So I would not put the attribute on the host name, or on the DSN string in the first parameter. The password definitely is sensitive. And the username possibly is a grey area. By default, I probably would not put the attribute there. But this is something that needs to be discussed in the greater community possibly. Derick Rethans 7:47 I saw in the RFC that when you request a stack trace in PHP with get back trace or whatever the name of this function is, is that the sensitive parameters are being replaced by an object of the class SensitiveParameter. Why did you pick that instead of just a string, saying something like "redacted". Tim Düsterhus 8:06 We cannot force users to put the attribute only on parameters that take strings. If we use a redacted string we might violate the type hint. If a function takes some key pair class, or an option of a key pair class, this usually is a sensitive attribute, we cannot simply put a string there. We can but then we would violate the typing. And as we violate the typing in at least some of the cases, we can also violate it in all of the cases and then make it very clear that this parameter was redacted and not a real value that just looks like a string "redacted". Exception handlers would be able to use an instanceof SensitiveParameter check to possibly make it more user friendly when they render the stack trace. When you using an GUI to handle your exceptions as such a Sentry can show some placeholder instead of pretending it's a real string in there. Derick Rethans 8:07 And of course, the string "redacted" can already exist as an argument value yet anyway, right? Tim Düsterhus 9:12 Yeah. Derick Rethans 9:13 Where would attribute be checked? Tim Düsterhus 9:16 My proposal would extend PHP to check this attribute within the function that generates the stack trace, because as I said, I want to keep my exception handler simple, so they won't need to use reflection to check this attribute. PHP itself will check this attribute when the stack trace is generated. So no exception handler can miss to check this attribute. Derick Rethans 9:39 Would it be possible for code that checks for SensitiveParameter to see what the original value was? I can imagine that in some cases, an exception handler as part of a debugging toolbar, whatever does want to show this extra information, although there's going to be hidden by default. Tim Düsterhus 9:58 Not with the current version of my RFC, but I can imagine that this sensitive parameter replacement value gets an attribute where the original value can be stored. Care would need to be taken, so exception handlers don't simply serialize that value and ship it to a third party service, basically negating the benefit. But a future extension, or maybe the further discussion of my RFC can extend this replacement value. So you can use sensitive parameter, arrow, original value, or whatever. Derick Rethans 10:34 In PHP attributes are basically markers on parameters or arguments. But they don't necessarily have to have an object implementation. Is your RFC also including the SensitiveParameter class that PHP core implements? Tim Düsterhus 10:51 Yes, in my current RFC, and my current proof of concept implementation, I'm just reusing that attribute class as the replacement value within the stack trace. So we can kill two birds with one stone by doing that, by including proper class, also, any IDE will be able to see that class and know where that attribute can be applied. Because attributes have a property where they say where they can be applied in this case parameters only. And by putting it on the method by accident, you will possibly get an error or the IDE can warn you that you're doing this not correctly, Derick Rethans 11:32 You might be aware that I work on Xdebug, a debugger for PHP. And in many cases, some of the users have already previously said that Xdebug should, for example, follow the debug_info() magic method on objects to show redacted information. Now, would you think that when people debug PHP with a debugger such as Xdebug, should they see the contents of the arguments that are set with SensitiveParameter, or should it stack traces show the real value? Tim Düsterhus 12:07 In case of debugging, you're not usually not in production. So within your debugging environment or development environment, you shouldn't really have any sensitive value such as passwords, or credit card numbers, or whatever there is. In that case, debugability and ease of development should be more important. Xdebug, or any other debugger should see through those sensitive attributes and show the real value, possibly with an indicator that this value would usually be sensitive. But you shouldn't need to work around PHP hiding something from you, because you really want or need to see what happens there. Derick Rethans 12:48 Now Xdebug also override PHP's standard exception handler, and then creates a stack trace of its own. Do you think that should redact the SensitiveParameter arguments? Tim Düsterhus 13:00 I'm not really sure if people run this in production. If this is something people usually do, then of course, Xdebug should make sure to redact those values, possibly with a special ini flag or something. If that's only used in development. In my case, I only use Xdebug in development and production servers don't have that; you don't really connect to your production server with your IDE and then step through the code. That does not happen. So we don't need Xdebug in production. Derick Rethans 13:32 I know some people do run Xdebug in production. But I also don't think those are the people that care about leaking sensitive parameters. I think the RFC talks about a few existing features that PHP already has for redacting some values. What are these? And how are they not sufficient? Tim Düsterhus 13:49 There are two php.ini values you can set. One of those is do not collect parameters in stack traces, I don't have the exact name. But basically, all functions will just show an empty parameter list within the stack trace. That makes debugging very hard, especially with PHP and the non-strict typing, it can happen that you pass some completely invalid value to a function, even in production after testing and such. And you really want to know about this value, because it makes debugging very hard. Not collecting the parameters makes the stack traces much, much less useful. So this targeted redaction, as I'm proposing, hides the sensitive values but the non sensitive values will still be visible. And the other one is that the length of collected strings within the stack place can be configured. By default. I think it's on 15, but 15 characters already include user passwords such as password, exclamation mark, or 12345. And also credit card numbers will be exposed to three fourths by then. And the last four digits are shown in clear text on many pages. So that doesn't really help with those type of user credentials. Of course, your database password might be 40 characters completely random. But that's not really the values you want, or need to protect, because the database server will not be exposed to the internet, in many cases. Derick Rethans 15:33 What has the feedback been so far to this RFC? Tim Düsterhus 15:36 Both positive, and "we don't need that nobody does that". It's a bit mixed. I've got some very good feedback. There's a Twitter account that tweets any new RFCs. And so the users on Twitter, the actual users, and not PHP internals list seem to be very happy with my proposal. On the list, many said, just don't log that values, or they don't really see the benefit yet, I think. Not really sure how the feedback is really. Derick Rethans 16:07 That's always a tricky thing, isn't it? Because the people that think "Oh, this is all right", often bother responding, because they don't have anything to add or criticize. Tim Düsterhus 16:17 Exactly. People that are happy won't write any reviews for whatever, just the people that complain are complaining. Derick Rethans 16:24 Yeah, it's either the people that are complaining are the people that are really happy about something. Are you expecting there to be any backward compatibility breaks? Tim Düsterhus 16:34 Yeah, obviously, when the attribute class name will be taken by default by PHP, userland code cannot use that any more. But I don't think that anyone is using a SensitiveParameter class in the global namespace. I used GitHub search and SensitiveParameter in PHP code only appears in some strings, in the AWS SDK or something like that. The replacement value will break any type signature. So if the exception handler checks, the original parameter types for whatever reason, that will, or might break, but I don't really think that's likely either. I don't expect any major backwards compatibility breaks. Derick Rethans 17:17 That's good to hear. And also good to hear that you have done some research into this. Do you have any extra selling points to convince people? Tim Düsterhus 17:26 My initial selling point was PDO's constructor. Or not really selling point, but example, because it's very obvious and it's in PHP core. I later expanded that with the credit card numbers and user passwords, and made, attempted to make this more clear that those sensitive values are not just values from your personal computing environment, but also something user input into your application. And that stack traces will be sent to third parties e.g. Sentry, which might even be run as a software as a service solution. And then your deep in GDPR territory. You don't want that. Derick Rethans 18:03 No, absolutely not. Tim, thank you for taking the time this morning to talk to me about your RFC. Tim Düsterhus 18:10 Thank you for having me. Derick Rethans 18:15 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening. I'll see you next time. Show Notes RFC: Redacting parameters in back traces PHP RFC Bot on Twitter Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 96: User Defined Operator Overloads
PHP Internals News: Episode 96: User Defined Operator Overloads London, UK Thursday, December 16th 2021, 09:24 GMT In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "User Defined Operator Overloads" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 96. Today I'm talking with Jordan, about a user defined operator overloads RFC that he's proposing. Jordan, would you please introduce yourself? Jordan LeDoux 0:33 My name is Jordan LeDoux. I've been working in PHP for quite a while now. This is the second time I have ventured to propose an RFC. Derick Rethans 0:44 What was the first one? Jordan LeDoux 0:45 The first one was the "never for parameter types", which was much more exploratory. And we talked about it a little bit. And it generated a lot of good discussion that contributed to kind of the idea formation, which was what I hope to get out of it. Derick Rethans 1:01 Okay, but that didn't end up making it into a PHP release. As far as I understand, right? Jordan LeDoux 1:07 No, I withdrew it actually, it was clear that the better way to approach the problem it was trying to solve was with a much more comprehensive solution. That particular solution was something that only required a seven line change to the engine. So I wanted to see if it was something people were okay with, or thought was a decent idea for that particular problem, much more comprehensive, like template classes, or something like that is probably the better route to go. Derick Rethans 1:35 Well, I think the RFC that we're talking about today, is going to require quite a bit more than seven lines of code? Jordan LeDoux 1:41 Quite a bit more. Yeah. Derick Rethans 1:42 So what is this RFC that we're talking about today? Jordan LeDoux 1:45 Well, user defined operator overloads is a way for PHP developers to define the ways in which objects interact with specific operators. So for instance, the plus operator, the plus sign. It's a way for those objects to kind of define their own logic as far as how that's handled, which right now, as of PHP 8.0, those were all switched to type errors. So it's not possible currently to write any code that doesn't result in a fatal error, where objects are used with operators. Derick Rethans 2:25 Usually, I ask about every RFC, what problem are you trying to solve this? So what problem are you trying to solve this RFC? Jordan LeDoux 2:31 The biggest problem that this solves is that objects contain, so objects in most programs represent a value or multiple values that have a program context. That's the most powerful thing about objects is they're contextual, and they understand the state, they understand what state the object is in, and sometimes even what state the whole program is in. And that's necessary for a lot of things. Like for instance, if you're tracking a distance, you know, you might measure that meters, and that would have a number you might have 30 meters of distance, but it also has a unit of meters. You could just represent that as an int. And then the program just knows internally, hey this is always in meters. But if you need to convert that to a different unit, then that becomes: Okay, well, now I need a special case some things, or I need a function just for converting, and I need to remember which unit my number is in. In a lot of cases, you handle that with objects because objects understand state, and they understand state transitions, which is what a lot of methods are about; transitioning the state of the object from one state to another. Operators are also about state transitions. And they're about very specific kinds of state transitions. It's natural in a lot of ways to think that you, you should be able to define how those two things interact. But currently, it's just not possible within PHP. Derick Rethans 4:00 Well, does them this magic operator overloading? Jordan LeDoux 4:04 It allows PHP developers to define an implementation logic, which is much like you define a function body that describes how does this object interact with this operator. That's essentially it. There's a lot of other details as to how it does that and what are the restrictions, but that's really the core of the idea. Derick Rethans 4:26 And in what kind of situations would you use that? Jordan LeDoux 4:28 A lot of them are situations where you're doing very complicated mathematics, or scientific computing or machine learning or things of that nature, where you are going to routinely encounter numbers that have state to them or that have multiple dimensions to them. So for instance, vector mathematics is one where the way that vectors interact with a lot of the operators that we're familiar with, like the multiplication sign is very different than how the number five interacts with the multiplication sign. Complex numbers is another one, you know, to multiply two complex numbers together, you have to treat it like a polynomial where you're multiplying it with the FOIL method: first, outside, inside, last. You know, there's a lot of those sorts of circumstances. But it also could potentially be very useful for some things that are not really mathematical but more quality of life for PHP developers. For instance, scalar objects is something that a lot of developers in PHP have, you know, wanted for a while. It's a thing that's a little more difficult to pin down, how exactly would you go about doing this within the engine, and it's a thing that the engine would kind of have to be very opinionated about by its nature. PHP developers can't provide their own scalar objects. And the main reason for this is that scalars interact with operators and objects can't. So simply allowing PHP developers to define a way for objects to interact with operators would allow user land to develop their own scalar object replacements. It wouldn't make every scalar that object; scalar objects within the engine still has, it's a separate feature. And it's still a thing that would be desirable, probably to a lot of people. But it gets quite a bit of the way there. Derick Rethans 6:20 It is always interesting that people come up with the example of complex numbers, because I'm not sure how useful that is in a PHP user land context. And then beyond the scalars, I then sometimes struggle to see where this could be used. With the only exception is probably doing calculations with money related issues. The moment you bring up operator overloading, you'll also get people to say that this is going to get abused. Examples of that, in my opinion at least, is where in C++ you have like the << operator to put things into the stream and stuff like that. What answer would you have to kind of comments? Jordan LeDoux 6:58 Abuse of operator overloads to do things that can create unmaintainable code, because that's really the concern for developers is, does a language feature promote code that's difficult to maintain, that's difficult to understand, that's difficult to follow, and develop, and you know, work with. The RFC, the way that I've gone about this implementation, has had that in mind, because I also have experienced that. This is not a thing where I coming down from the academic high tower with, you know, whatever my my concept of this is, and no, no real world experience with these things. I share a lot of those concerns. Actually, I think this is a very useful feature that has a lot of applications I've encountered. I have had to work with matrix maths, I have had to work with complex numbers, I've had to work with arbitrary precision numbers, and all of those situations would have been served so much better by having operator overloads. I was fighting with the language the entire time, I was trying to do those. But I understand you know, in a lot of web applications, those are not common problems to encounter. My experience of that isn't typical. The thing about the way that it's done is it tries to head off a lot of the ways that it could be misused. An example of that is that the RFC requires typing of the parameters. You can't define an operator method and leave the types blank. If you do, then you get a fatal error during compile. It tells you you must explicitly define a type. And the reason for this is that blank types are assumed to be mixed. So it's the same as putting mixed for the type within the engine. And a mixed type says I can take anything, it doesn't matter what you give me, I can take anything. But that simply isn't true for operators. It's never true. Because even if you think hey, I can accept floats, ints, I can accept any objects, I can figure something out with them. You know, even if you think that's true, what happens when somebody passes you a stream resource? I mean, that's part of mixed. Any implementation that says mixed is probably lying. This RFC requires you to document what are the types that you know how to interact with for this operator. And that's the thing that that developers are kind of going to be forced to think about when they implement this. You know, and that's one example. But there's several other things within the RFC that kind of try and take that concern very seriously. And say, what are the strategies we could design something that is going to be used correctly, most of the time, just by design. Derick Rethans 9:42 Would just not then create an inconsistency in the language where for some methods, you simply have to type the arguments. Jordan LeDoux 9:50 So yes, it's it is different than how other functions are defined. And methods are defined on classes, but that's one of the reasons that I believe very strongly that using a keyword other than function is a good idea. That's one of the other things that this RFC proposes is, instead of saying function plus or whatever, you say, operator plus. One of the things that that does is that signals to the developer, this is a different thing. That's not a trivial aspect of the RFC. It's not something that can just kind of be thrown away. It's like, oh, that sugar. In a very real way communicates to the developers, this is not like other functions, this is a different thing. It is a function internally within the engine. But that's because that's faster to do it that way. And it's a better way to implement it internally, within core. Developers should not be treating it in PHP as a function, it shouldn't be used that way. It's an engine hook. Derick Rethans 10:51 When you're writing the code. If you do operator plus, for example, then at that point, it's clear what the plus does, but not necessarily, when you read the code, and you see the plots, you don't necessarily know what it means, right? Which I think is one of the bigger criticisms of having operator overloading support. But then you can also make the argument saying that well, operators they have a specific meaning in normal language, right. The plus means adding two things. So the argument would be that only use the plus operator for adding things together, not for example, adding a comment to a blog post, which you technically could do, right? Jordan LeDoux 11:25 You could. Derick Rethans 11:26 I definitely say that is something you should definitely not do, which you could, for example. Jordan LeDoux 11:30 That's another reason to kind of not treat them as functions in the syntax. You know, I think that having that operator keyword there really communicates that strongly to PHP developers. You know, when you look at a line of code, that's variable A plus variable B, and you're sitting there thinking: Hmm, I wonder if there's an operator overload involved here, because that might be a thing you do have to think about if this were included in core. While that's an additional thing that might have to be investigated, you know, by developers, and that that's not a trivial thing, I completely acknowledge that. It's also not a thing that would happen by accident, it would have to be intentional, because all objects error, if they're used with an operator currently, and after this is introduced, all objects will continue to error unless they define their own overload within the class that's being called, or one of its parents obviously, because inheritance is respected. It's not a thing that would happen by accident, there's no code that's going to accidentally inject an object into an operator, and all of a sudden, PHP makes wild assumptions and your code is spitting out a number that doesn't make sense, or something like that, because it's simply going to error. This is going to error very early. So you're going to get that feedback from the engine right away, when you do something like that. Maybe you didn't intend or that maybe was ambiguous. Derick Rethans 12:55 I've just realized that in languages like C++, you can define multiple versions of the same operator, because you can have method overloading. This is not something you can do in PHP with normal methods either. So do I understand correctly that you can't do that in this case, either it, you need to accept multiple types in the overloaded operator, and then make a decision yourself. Jordan LeDoux 13:17 It was suggested to me by a couple of people who gave me very early feedback that, hey, C++ accomplishes this with method overloading, you should do method overloading. And I took one look at that and said: One, I'm already doing a lot of work for this, that sounds like double the work. And two, I'm not convinced that's the best way to do it. Three, that's a huge separate change, that should probably be considered separately. And four, I don't think it's necessary. You can accomplish it with Union types, which we have. And that's another thing that maybe this is a guardrail for PHP developers using it incorrectly. If you're unioning, eight different types, and maybe you're not using it correctly. I mean, that'll look ugly. And I'm people might complain: hey, I don't want to have to Union all these things. I want to be able to overload the method directly with multiple versions. Having that feedback, right in your code that: Hey, this looks ugly. Maybe I'm doing it wrong. I see it as a positive thing, in a lot of ways. Derick Rethans 14:19 I agree. First of all, it's a separate subject that should be discussed separately. Now, so far, we've only mentioned the operator keyword, but we haven't spoken about the rest of the syntax yet. So how would you define an overloaded operator? Jordan LeDoux 14:33 As we were discussing, there's the keyword operator. So you would define it very similar to how you would define a function. You can give it a visibility, but it can only accept the visibility public, you can omit that if you want. But it can be abstract or final. So you can have an abstract class that forces an implementation, or you can have a class that disallows overriding of the method. You use the keyword on operator, and then where the function name would go for any other function, you use the symbol that you want to overload, so you don't name it the English word plus, you use the actual symbol '+'. And then the rest of it is the way you would define any other function or method because it has a lot of the same concerns that functions do. But it visually looks very different, which I think is another good guardrail. Another good bit of feedback to developers. Derick Rethans 15:28 What are the arguments that the overload is operating methods need to accept? Jordan LeDoux 15:33 Most of them accept and actually require two arguments. The first is the corresponding operand. The things that are to the right and the left of your operator, they're called operands. And one of them will have this overload and the other one will be some kind of value. You need to accept the other value. And then the second parameter is the operand position, whether or not the operator overload being called; whether it's on the left side of the operator or the right side of the operator, because some some operations depend on whether or not it's on the left or right side. Derick Rethans 16:13 Would you say that most of the time, the operators will be used on two objects of the same class, in which case that doesn't really matter? Jordan LeDoux 16:22 A lot of the time, I think good implementations of this feature would involve objects that share a base class, share a parent class, or are the same class. I think it would be a very rare circumstance where a good usage of this feature would involve accepting a class that doesn't meet either of those criteria. Maybe it could happen, but I think in most situations, that would be another one of those things that kind of gives you you know, the code smell that a something may be wrong. Derick Rethans 16:55 Then of course, with the exception that, for example, vectors, you can multiply with a number. And I define number very loosely here. And then in that case, the order is important. So the RFC has a table of having a whole list of operators, but it doesn't include all of them. What kind of categories are included, which ones aren't? Jordan LeDoux 17:12 There's two main categories of operators that are proposed in this RFC, the mathematical operators, you know, your plus, minus, divide, multiply, the pow operator, and the modulo operator. And then the second class of operators are all the bitwise operators. So bitwise and, bitwise or, bitwise not, shift left, shift right, that kind of thing. Derick Rethans 17:37 And let's see in the table that It all says equals in the spaceship operators in there. But what I don't see in there, it's larger than, or smaller than operators. Jordan LeDoux 17:46 I made the decision very early when I was developing this RFC that I didn't want to support the comparison operators independently. And what I mean by that is, I didn't want to have an object that defined separate logic for the greater than sign than they did for the less than sign. That was mainly to avoid situations where reversing things would change the Boolean logic. Instead, there's a single operator, the comparison operator, or the spaceship operator, that allows you to overload all of them, but only in a way that's self consistent. By implementing that operator overload, you can cover all of the inequality operators, but it will always be consistent with its own output. It's never going to give you things that are logical contradictions with its own data. Derick Rethans 18:43 Would the overloaded spaceship operator implementation also be used for other comparisons, like greater than, less than and greater than equals? Jordan LeDoux 18:52 That's correct. Going into the implementation just a little bit. Internally, all of those operators, the greater than sign, the less than sign, greater than, and equals to, all of those are internally done as a comparison. That type of comparison where you're outputting, negative one, zero or positive one, they indicate, is it larger? Is it smaller? Is it equal? This actually keeps the PHP user land implementations more consistent with how things are done internally within the engine and makes it much easier to support all of those things, not just consistently, you know, without logical contradictions, but as far as how it gets done within the engine, it makes it much easier to handle those. Derick Rethans 19:39 Yeah, I see there's another few implied operators in there. For example, if you're like the -= operator, then that gets implied as $a = $a - $b and stuff like; that all seems to be fairly sensible there. And similar it like ++$a, you get $a = $a + 1, which is basically what that means. You mentioned the word implementation detail. And I have a question myself here is: The symbol tables contrary to support a plus or minus? So do they get transformed into a specific name, for example? Jordan LeDoux 20:12 Internally, the function name for a method on a class is stored as a Zend string, which can handle the symbols, it just doesn't. And that's mainly because the lexer can't; the parser is restricted from doing that, because it's kind of ambiguous in all contexts. For instance, outside of a class, following a function, using arbitrary symbols might cause some issues. But that's another thing that the operator keyword makes simpler. The operator keyword in the parser makes allowing the symbols much smaller implementation hurdle, I think that would be something that would be very difficult to do with the function keyword. But internally, it actually does get stored as the symbol. And then it gets put as a kind of an internal pointer with the other Magic Methods. Because internally, it's treated kind of like a magic method. Derick Rethans 21:07 Are they flagged with a specific flag or a bit, showing that they are overloaded methods? Jordan LeDoux 21:13 Yes, there's a new flag that's added as part of this. That's only for methods, ZEND_ACC_OPERATOR. Derick Rethans 21:21 Which I think becomes important if you start looking things like reflection. Because if you list all the methods on a class on the reflection class, then you sort of need to know, what are the already overloaded operator methods or normal methods? Jordan LeDoux 21:37 Yes, that's, that is something that became very important when I went into do the reflection implementation for this, which has also been completed at this point. As part of reflection, actually, I very much didn't want to return the operators with other methods. Because again, I don't think that developers should be encouraged to think of these as methods, in most circumstances. That having the flag there made that a very simple change. It was like three or four lines of code per implementation per method that was affected on the reflection classes, check the flag, and then we're done. We're out. Derick Rethans 22:13 In addition to that, of course, you gets operator specific reflection methods, right? Because you do want to check whether you have them. Jordan LeDoux 22:20 For normal methods, you have getMethod, getMethods, and hasMethod. And so there's three additional methods that are added to reflection class, getOperator, getOperators, and hasOperator, and they behave exactly the same way as the corresponding method ones, but they only deal with the operators. Derick Rethans 22:43 The RFC is talking about it an operator methods will be represented by reflection methods, which makes sense, but as you indicate there aren't really methods. And you shouldn't really think of them as methods. So would it not make sense to have a reflection operator method perhaps? Jordan LeDoux 22:59 I did consider that. So when I was looking at the implementation for ReflectionMethod, I was looking at the methods that you have on that. And I was saying to myself, is this something that shouldn't be there for operators that not only, you know, maybe it doesn't provide useful information, like for instance, isPrivate will always be false for operators because you can't make operators private, but it doesn't break for operators, it still works. And all of the methods on ReflectionMethod were of that nature. Some of them were not super useful for operators, but none of them were things that were broken, or that were totally didn't make sense. And so because of that, I thought, well, maybe it's better to just have ReflectionMethod and just use that again, instead of creating a separate one that doesn't really have any additional functionality. It's just a copy, essentially, so that they don't have to be maintained separately. Derick Rethans 23:57 I see in the RFC, that you're also adding the isOperator methods to reflection methods, so that you can distinguish between normal methods and operator overloaded methods, right, which is then I suppose the alternative to having a different instance class that represents either the method or the operator? Jordan LeDoux 24:15 So that was the only thing that I really saw as being necessary, necessarily different, is being able to tell is my instance of ReflectionMethod a normal method or an operator method. That could be solved by having a child class instead, that would be another way to do it, I can definitely see advantages of doing it that way. And I thought about doing it that way. It's already a very big RFC. I kind of wanted to reduce the amount of things that people had to think about or that people had to say, well, this is something different. This is already very different from a lot of things in PHP. And it was one of those things where I was like, that seems like a place where it's not necessary for me to create something new for people to consider. Derick Rethans 24:56 As you say, this is quite a long and complicated RFC. What's been the feedback been so far? Jordan LeDoux 25:02 A lot of the feedback so far has revolved around the new keyword, the operator keyword. You know, questions about why is this necessary, as opposed to using the function keyword, which we talked about already a little bit. And kind of going through, what are the implications of that, not just within PHP, but also downstream for tooling to things like Psalm, Rector, tools that PHP developers use IDEs, PhpStorm, you know, what are they going to have to do to handle this? And is that more difficult or less difficult with a keyword? Depending on what the answer to that is? Is that trade off worth it? Derick Rethans 25:41 Has there been any of the expected feedback saying: Oh, this is just going to be abused by users all over the place? Jordan LeDoux 25:47 There's been one or two so far, you know, I think operator overloading as a concept as a feature in programming. And this isn't restricted to PHP as a language. This is something that comes up in other languages, too. I think, as a concept, this feature is something that's always kind of been that way to a lot of languages. There's very few languages where people don't have strong opinions about it. Even in those languages, people don't really encounter that often. But it's the kind of thing that people feel strongly about. So I would always imagine that there are going to be people who, quite rightly, from their own experience, believe that this is just a bad idea. And I can understand why they would think that. I disagree, but I can understand why they would think that. I think about the only language I'm aware of that doesn't have that kind of thing going on is maybe R, but R is a language that's kind of designed around nothing but mathematics. So the idea of being able to control operators is kind of central to what the language does. So it's maybe the only example I can think of, but the rest of them, you know, it is somewhat controversial. And I think it kind of always will be, even if it gets accepted. Derick Rethans 26:54 Talking about that. When do you think you'd be opening voting for this? Jordan LeDoux 26:58 I'm thinking more along the lines of early January. I think holding the vote two weeks after I announced it on internals a second time, it would be right almost on top of Christmas, I think that would also kind of be a bit unkind, and also may not serve the RFC well. So I think waiting till January is probably the right idea. Derick Rethans 27:18 I think that's the nicer way of doing it as well. Yes. Do you have anything to add that we forgot to speak about? Jordan LeDoux 27:25 I wanted to mention going back to the operator keyword, and kind of the discussion around that. And the feedback that's been generated so far on that, a really good way to think about it is that the operator keyword is very similar to the enum keyword. Enums are classes, they simply are, but they're classes with very specific restrictions on them. The operator is a function, but it's a function with very specific restrictions on them. And it's for a lot of the same reasons. Enums are intended to be used for a very specific purpose. Operator overloads are also intended to be used for a very specific purpose. And that's one of the reasons that I think it's not not as bad of a thing. And I think that people really should be thinking about it more in terms of why we have the enum keyword instead of terms like, why don't we just use another magic method or something like that? You absolutely could do it that way, the same way that you could do enums it's just classes, but there's value there and doing it with its own keyword, I think. Derick Rethans 28:29 Well, thank you, Jordan for taking the time this morning or your night, to talk about the operator overloads proposal. Jordan LeDoux 28:35 Yeah, thank you for having me. Derick Rethans 28:41 Because I've been on hiatus for a while I wanted to jump in with a few newsworthy items. First of all, I would like to thank Nikita for the many years he worked on PHP, while being an employee of JetBrains. He has decided that he wants to work on something else besides PHP and choose to leave JetBrains to work on LLVM. This means that I will be speaking to him on this podcast a lot less, if at all. With Nikita's departure the PHP protect now has nobody working full time on it, as it is desirable for the continuation Nikita's old employer, JetBrains, has banded together with members of the PHP community, including core contributors, companies and sponsors to set up a foundation to fund contributors to work on PHP. Once this is up and running, I will make sure to dedicate an episode to this exciting new development. I have included a link to the foundation on Open Collective in the show notes. Just before Nikita left the project two more RFCs were passed. The first one was to move the PHP bug tracker from https://bugs.php.net to https://github.com/phps/php-src repository now accepts your bug reports, whereas the bugs.php.net system has been largely retired. We still accept security bugs on the old issue tracker because we can discuss these in private there before making them public. The second RFC implemented the deprecation of dynamic properties with PHP 8.2. Instead of allowing codes to define a rights to undeclared properties, they will now need to be defined in your class definition, otherwise, you will get a deprecation warning. I have included the link to this RFC in the show notes as well. I'm not sure whether I will produce a specific episode on the subject. With all the news out of the way, I'd like to thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying development of the PHP language. I maintain a Patreon for an account for sponsors of this podcast as well as the Xdebug debugging tool. You should sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: User Defined Operator Overloads RFC: Deprecate Dynamic Properties PHP Foundation on Open Collective Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 95: PHP 8.1 Celebrations
PHP Internals News: Episode 95: PHP 8.1 Celebrations London, UK Thursday, November 25th 2021, 09:23 GMT In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 8.1. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:23 This is episode 95. I've been absent on the podcast for the last few months due to other commitments. It takes approximately four hours to make each episode. And I can now unfortunately not really justify spending the time to work on it. I have yet to decide whether I will continue with it next year to bring you all the exciting development news for PHP 8.2. Derick Rethans 0:44 However, back to today, PHP eight one is going to be released today, November 25. In this episode, I'll look back at the previous episodes this year to highlight a new features that are being introduced in PHP 8.1. I am not revisiting the proposals that did not end up making it into PHP 8.1 feature two features I will let my original interview speak. I think you will hear Nikita Popov a lot as he's been so prolific, proposing and implementing many of the features of this new release. However, in the first episode of the year, I spoke with Larry about enumerations, which he was proposing together with Ilija Tovilo. I asked him what enumerations are. Larry Garfield 1:26 Enumerations, or enums, are a feature of a lot of programming languages. What they look like varies a lot depending on the language, but the basic concept is creating a type that has a fixed finite set of possible values. The classic example is booleans. Boolean is a type that has two and only two possible values true and false. Enumerations are way to let you define your own types like that, to say this type has two values Sort Ascending or Sort Descending. This type has four values for the four different card suits, and a standard card deck. Or a user can be in one of four states pending, approved, cancelled or active. And so those are the four possible values that this variable type can have. What that looks like varies widely depending on the language. In a language like C or C++, it's just a thin layer on top of integer constants, which means they get compiled away to introduce at compile time, and they don't actually do all that much they're a little bit to help for reading. On the other end of the spectrum, you have languages like rust or Swift, where enumerations are a robust, advanced data type and data construct of their own. That also supports algebraic data types. We'll get into that a bit more later. And is a core part of how a lot of the system actually works in practice, and a lot of other languages are somewhere in the middle. Our goal with this RFC is to give PHP more towards the advanced end of enumerations. Because there are perfectly good use cases for it, so let's not cheap out on it. Derick Rethans 3:14 In the next episode, I spoke with Aaron Piotrowski about another big new feature: fibres. Aaron Piotrowski 3:20 A few other languages already have Fibers like Ruby. And they're sort of similar to threads in that they contain a separate call stack and a separate memory stack. But they differ from threads in that they exist only within a single process and that they have to be switched to cooperatively by that process rather than pre-emptively by the OS like threads. And so the main motivation behind wanting to add this feature is to make asynchronous programming in PHP much easier and eliminate the distinction that usually exists between async code that has these promises and synchronous code that we're all used to. Derick Rethans 4:03 I also asked Aaron about small PHP I actually have a slightly related question that pops into my head as like. There's also something called Swoole PHP, which does something similar but from what I understand actually allows things to run in threats. How would you compare these two frameworks or approaches is probably the better word? Aaron Piotrowski 4:25 Swoole is they try and be the Swiss Army Knife in a lot of ways where they provide tools to do just about everything. And they provide a lot of opinionated API's for things that in this case, I'm trying to provide just the lowest level just the only the very necessary tools that would be required in core to implement Fibers. Derick Rethans 4:48 Although I discussed several deprecations from Nikita and the last year, I only want to focus on the new features. In episode 76. I spoke with him about array unpacking, after talking about changes to Null in internal functions. Nikita Popov 5:01 The old background is set we have unpacking calls. If you have the arguments for the call in an array, then you write the free dots and the array is unpacked intellectual arguments. Now what this RFC is about is to do same change for array unpacking, so allow you to also use string keys. Derick Rethans 5:24 In another episode, I spoke with David Gebler on a more specific addition of a new function fsync. David explains the reason why he wants to add this to PHP. David Gebler 5:34 It's an interesting question, I suppose in one sense, I've always felt that the absence of fsync and some interface to fsync is provided by most other high level languages has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP core getting to learn the source code. And it's a very small contribution, but it's one that I feel is potentially useful. And it was easy for me to do as a learning exercise. Derick Rethans 5:58 And that is how things are added to PHP sometimes, to learn something new and add something useful at the same time. After discussing the move of the PHP documentation to GIT an episode 78, in Episode 79, I spoke with Nikita about his new in initializers RFC. He says: Nikita Popov 6:15 So my addition is a very small one, actually, my own will, I'm only allowing a single new thing and that's using new. So you can use new whatever as a parameter default, property default, and so on. Derick Rethans 6:29 The addition of this change also makes it possible to use nested attributes. Nikita explains: Nikita Popov 6:34 I have to be honest, I didn't think about attributes at all, when writing this proposal. What I had in mind is mainly parameter defaults and property defaults. But yeah, attribute arguments also use the same mechanism and are under the same limitations. So now you can use new as an attribute argument. And this can be used to effectively nest attributes. Derick Rethans 6:59 Static Analysis tools are used more and more with PHP, and I spoke to the authors of the two main tools, Matt Brown, of Psalm, and Ondrej Mirtes of PHPStan. They propose to get her to add a new return type called noreturn. I asked him what it does and what it is used for. Ondrej Mirtes 7:14 Right now the PHP community most likely waits for someone to implement generics and intersection types, which are also widely adopted in PHP docs. But there's also noreturn, a little bit more subtle concept that would also benefit from being in the language. It marks functions and methods that always throw an exception. Or always exit or enter an infinite loop. Calling such function or method guarantees that nothing will be executed after it. This is useful for static analysis, because we can use it for type inference. Derick Rethans 7:49 Beyond syntax, each new version of PHP also adds new functions and classes. We already touched on the new fsync function, but Mel Dafort proposed to out the IntlDatePatternGenerator class to help with formatting dates according to specific locales in a more specific way. She explains: Mel Dafert 8:07 Currently, PHP exposes the ability for locale dependent date formatting with the IntlDateFormat class, it says basically only three options for the format long, medium and short. These options are not flexible in enough in some cases, however, for example, the most common German format is de dot numerical month dot long version of the year. However, neither the medium nor the short version provide and they use either the long version of the month or a short version of the year, neither of which were acceptable in my situation. Derick Rethans 8:40 And she continues with her proposal: Mel Dafert 8:42 ICU exposes a class called DateTimePatternGenerator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. The skeleton just includes which parts are supposed to include it to be included in the pattern, for example, the numerical date, numerical months and the long year, and this will generate exactly the pattern I wanted earlier. This is also a lot more flexible. For example, the skeleton can also just consist of the month and the year, which was also not possible so far. I'm proposing to add IntlDatePatternGenerator class to PHP, which can be constructed for locales and exposes the get best pattern method that generates a pattern from a skeleton for that locale. Derick Rethans 9:26 Locales and internationalization have always been an interest for me, and I'm glad that this made it into PHP 8.1. I spoke at length with Nikita about his property accessors RFC, in which he was suggesting to add a rich set of features with regard to accessibility of properties, including read only, get/set function calls, and asymmetric visibility. He did not end up proposing this RFC, which he already hinted that during our chat: Nikita Popov 9:53 I am still considering if I want to explore the simpler alternatives. First, there was already a proposal, another rejected proposal for Read Only properties probably was called Write Once Properties at the time. But yeah, I kind of do think that it might make sense to try something like that again before going to the full accessors proposal, or instead. Derick Rethans 10:18 He did then later proposed a simpler RFC read only properties, which did get included into PHP eight as a new syntax feature. He explains again: Nikita Popov 10:27 This RFC is proposing read only properties, which means that a property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have Type Properties. Remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that a string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value. For example, one nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP. Derick Rethans 11:21 Nikita, of course, meant this kind of immutable object pattern, which we didn't pick up on during the episode. Another big change was the PHP type system, where George Peter proposed out pure intersection types. He explains what it is: George Peter Banyard 11:35 I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want x and y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. Derick Rethans 11:54 To explain our pure George Peter says: George Peter Banyard 11:58 So the word pure here is not very semantically, it's more that you cannot mix union types and intersection types together. Derick Rethans 12:06 Just after the feature freeze for PHP 8.1 happened in July, another RFC was proposed by Nicolas Grekas to allow the new pure intersection types to be nullable as well. But as that RFC was too late, and would change the pure intersection type to just intersection types, it was ultimately rejected. Derick Rethans 12:23 The last feature that I discussed in a normal run of the podcasts was Nikita's first class callable syntax support. He explains why the current callable syntax that uses strings and arrays with strings has problems: Nikita Popov 12:35 So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two string signs inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array elements will also be renamed. But that's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that colour bulls are not scope independent. For example, if you have a private method, then like at the point where you create your, your callable, like as an array, it might be callable there, but then you pass it to some other function, and that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both the like this callable syntax based on arrays, and also the callable type, is callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable culpability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable. Derick Rethans 14:08 This new feature is a subset of another RFC called partial function applications, which was proposed by Paul Crovella, Levi Morrison, Joe Watkins, and Larry Garfield, but ultimately got declined. So there we have it, a whirlwind tour of the major new features in PHP 8.1. I hope you will enjoy them. As I said in the introduction, I'm not sure if I will continue with the podcast to talk about PHP 8.2 features in 2022 due to time constraints. Let me know if you have any suggestions. Derick Rethans 14:41 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes Episode #73: Enumerations Episode #74: Fibers Episode #76: Array Unpacking Episode #77: fsync function Episode #79: New in Initialisers Episode #81: noreturn type Episode #85: Add IntlDatePatternGenerator Episode #86: Property Accessors Episode #88: Pure Intersection Types Episode #90: Readonly Properties Episode #92: First-Class Callable Syntax Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 94: Unwrap Reference After Foreach
PHP Internals News: Episode 94: Unwrap Reference After Foreach London, UK Thursday, August 26th 2021, 09:22 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 94. Today I'm talking with Nikita Popov about the unwrap reference after foreach RFC that he's proposing. Nikita, would you please introduce yourself? Nikita Popov 0:33 Hi, Derick. I'm Nikita and I work at JetBrains on PHP core development. Derick Rethans 0:38 So no changes compared to the last time. Nikita Popov 0:41 Yes, at the time before that. Derick Rethans 0:43 So what is the problem that is RFC is going to solve? Nikita Popov 0:46 Well, it's really a very minor thing. I think it's a relatively well known problem for the more experienced PHP programmers. It's like a classic example, you have a foreach loop by reference. So foreach array as value by reference, and then you do a second loop after that, foreach array as value at the same it's by value. So without the reference sign. The result of that is that your last two array elements are going to be the same, which is kind of unexpected. If you're not familiar with how references in PHP work and scoping in PHP works. So I think it's worth explaining what's going on there. Derick Rethans 1:27 Can you quickly explain the scoping or rather the lack of it, I suppose? Nikita Popov 1:31 Yeah, it's really the lack of PHP really only has function scoping. So if you have a foreach array as value, then the value variable is going to stay alive, even after the foreach loop. And usually, that won't make much of a difference. So you will just have like reference to the last element of the array, might even be useful for some cases, you know, before we added the array, I think, array_key_last function. If the last element now is a reference, so if you have a reference to the last element, then you're write into that variable is also going to modify the last element of the array. So if you now have a second foreach loop, using the same variable, that's actually not just modifying that variable, but it's also always modifying the last element of the array. Derick Rethans 2:15 Okay, just to clarify, it isn't necessarily the last element in the foreach loop. It's the last one that's been assigned to? Nikita Popov 2:22 Yeah, that's, that's true. Derick Rethans 2:24 Is this not something that people actually use for some useful reasons? Nikita Popov 2:28 As mentioned before, technically, you could use it to get a reference to the last element and then modify the last element outside the foreach loop. I don't think this is a particularly common use case. But I'm sure people have used in here there. This is a use case we would break with the proposed RFC. Derick Rethans 2:47 I think it is one I have used in the past, it's probably not how I would do it now. But I'm pretty sure I have some point in the past. What are you proposing to change with this RFC? Nikita Popov 2:57 The change is pretty simple. And that's to unwrap or to break the reference after the loop. You will still have like after the loop, the variable will still contain the value of the last element, or of the last like visited element, but it will no longer be a reference to it. If you write into the variable, it will not modify the original array. And if you have a second loop that writes into the variable that also doesn't modify the original error any more. Derick Rethans 3:25 At which point and how is this reference broken? Nikita Popov 3:29 It's at the end of the foreach loop, or as you say, if you break out too early, then of course, it would also get broken. So it's referenced inside the foreach loop and stops being referenced outside the loop. Derick Rethans 3:41 And that would happen also, if I would use a goto for example? Nikita Popov 3:45 Oh, that that's a trick question, actually, yes, it should happen. But now that you have mentioned it, I think my current implementation does not handle that particular case, I will have to double check it. But that should happen, yes. Derick Rethans 4:00 It's good to know that you've thought about it then. Nikita Popov 4:02 Well, I didn't think about it. Because I mean, I guess I can mention it here, the way this works is that well, at the end of the foreach loop, we have like an instruction that frees the loop variable. And I can just add an additional one that breaks reference. But if you use things like goto or multi level breaks, or something like that, then we insert these clean-up instructions before the jump. We have to make sure to actually insert the reference breaking instruction there as well. So it's like not automatically handled. Derick Rethans 4:38 Is this going to be a separate instruction or as we tend to call them opcodes? Nikita Popov 4:43 I'm using a separate one, but one could run it as a flag into the instruction that frees the loop variable, but I think it's cleaner to have a separate instruction for it. Like technically one could optimize it away in some cases, like I wouldn't bother but it's like semantically a different thing. Derick Rethans 5:01 I think it'd be nicer result, because it makes it easier to visualize what's happening, right? Nikita Popov 5:06 Yeah, it is. Derick Rethans 5:07 Did you actually check whether some code uses this construct? Nikita Popov 5:10 I have to admit, I tried checking it using a very basic approach, just look at foreach loops by reference. And then if the variable is used after that. But that kind of primitive approach has way too many false positives, for example, you have a foreach loop inside, and if, and then the variable is reused inside an else. So it like wouldn't flow from the if into the else. So you would have to do some kind of more sophisticated control flow analysis. It's something that can be done, but I didn't bother doing it for a one off backwards compatibility check. So I don't have any hard data on how much code is actually using something like this. Derick Rethans 5:51 So this is where I'm a little bit on the fence about this change, because it is changing behaviour, that's going to be pretty hard to figure out what is actually going to affect your codebase. Nikita Popov 6:01 It should be possible to very reliably detect that. It's just something you have to actually implement. But you're right now there is no easy way to check that. Derick Rethans 6:13 It's something that static analysers could probably have a look at. Nikita Popov 6:16 Yeah, expect that maybe Psalm or PHPStan, something like that will be easier to implement, because they already have control flow information. Derick Rethans 6:23 You don't really know how impactful this, which is, in my opinion, a bit of the scary bit. How important do you think you'll find it to have this RFC going through and implemented? Nikita Popov 6:33 I don't think it's super important. It's mostly like, small quality of life fix for newer developers . People who have already encountered this issue once won't forget about it again. In fact, it's somewhat common recommendation that you should always unset the loop variable after a foreach by reference loop. So I've seen that as like a policy some people use, that could be avoided. So yeah, I don't think it's a critical feature, just a small improvement. Derick Rethans 7:08 Would it be an alternative idea to instead deprecate the foreach by reference? Nikita Popov 7:14 Okay, that's the radical approach. Everything is possible. I think that foreach by reference is relatively, I mean, I think it's one of the most common uses of references we have, and one of the most reasonable ones. I mean, the alternative is search into by value loop, and then you modify it by looking up the element by key again, which is a bit more ugly, I would say. I think we shouldn't deprecate foreach by reference, though it would be kind of nice to have a different way to achieve the same. One other unfortunate thing about foreach by reference is that it leaves behind references in the array. The case I'm looking at here is this reference to the last element, where you have like reference structure that's pointed to both from inside the array, and from this loop variable. The other thing that foreach by reference does is that for all the other array elements, you will actually leave behind the reference wrapper that's just used in this one single place for this single array element. Essentially, you are wasting memory, because we will leave behind this that reference wrapper. So after you do the foreach by reference loop over the array, the array will actually grow larger. So if you're storing like integers, and it may grow significantly larger, like from a technical perspective, foreach by references, also not great. But like from a usability perspective, it's nicer then modifying values by key lookup. Derick Rethans 8:53 I guess it's going to depend on how big the array is, right? I mean, if it's a few elements, it probably doesn't matter. Nikita Popov 8:58 But if you have like a 100,000 element array, then you paying for 100,000 reference wrappers that you don't need afterwards any more. Derick Rethans 9:07 In that case, it's rather better to just modify it through the key that you obtained by doing foreach key as value. Nikita Popov 9:14 Right. But it's also worth noting that foreach reference actually has different semantics then foreach value, because foreach by value works on the copy of the array. Like it's not an actual copy just like semantically. If you modify the array inside the foreach by value loop, then we will copy the array. Doing the modification with a separate key lookup and foreach by value loop will actually copy the array at that point, while foreach by reference takes account modifications of the array. So even if you like add or remove elements in the array in the foreach by reference loop, it will try on the like best effort basis to still iterate on in a reasonable way on the modified array. It's like not a straightforward replacement. Derick Rethans 10:00 It all depends on what people intended to do with it. Right? Do you think there are any further situations that are a bit strange? That could benefit from having some subtle changes to the language semantics? Nikita Popov 10:13 Nothing can who comes to mind immediately. Derick Rethans 10:16 Yeah, I can't think of any either. But I thought maybe maybe have something in the pipeline. Would you have anything else to add to this RFC? Nikita Popov 10:23 Well, one more thing that's discussed in the RFC is the case of complex variables. A little known fact, in the foreach loop, you don't have to assign to a simple variable, you can also assign to something like an object property, or an object property on the result of a function call that that means that in the loop, this function is getting called on every iteration, and then you assign it to a property on the result. So you can do that kind of weird stuff, we allow it. Derick Rethans 10:52 And does it the work without any weird side effects? Nikita Popov 10:56 Depends on what you consider weird, but basically does what you expect as if you had written an explicit assignment to the complex variable. Derick Rethans 11:04 I reckon that's how it's instructed out in the oparray then as well. Nikita Popov 11:07 Yeah, exactly. As far as this RFC is concerned, the problem there is that to unwrap the reference of the loop, we actually have to evaluate the variable again. And if it's a complex variable that might have side effects, for example, the function call. And that's why the RFC says that if the variable is complex, we are not going to do that, like that's probably going to be more unexpected than leaving a reference wrapper around. So we have this extra weird edge case. In the internals discussion, some people already suggested that maybe we should just deprecate support for these kind of complex assignments. One could also mention that an alternative that has been suggested is to actually make the loop variable, scoped to the foreach loop. So we could unset it entirely after the loop, rather than just breaking the reference, which is, of course, a larger change, larger backwards compatibility break. It also doesn't really align with PHP semantics of only having function scope and not block scope. Derick Rethans 12:06 I probably agree without, it's too much of a change to do that. Because then you sort of expect that all the language constructs should have a scope. I mean, it needs to be either one or the other. Nikita Popov 12:15 Yeah, I mean, other languages like JavaScript have solved that by introducing a separate way to declare scoped variables. So that will be "let", just changing the behaviour in one place is probably not a good idea. Derick Rethans 12:30 I probably agree with you though. It was a bit of a shorter RFC this time. That's okay with me. Nikita Popov 12:35 Yes, I used that as an excuse to discuss some foreach behaviour details. Derick Rethans 12:40 Fair enough. Thank you for taking the time this morning to come and talk to me about the references after foreach RFC. Nikita Popov 12:47 Thanks for having me, Derick, once again. Derick Rethans 12:53 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening. I'll see you next time. Show Notes RFC: Unwrap Reference After Foreach Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 93: Never For Parameter Types
PHP Internals News: Episode 93: Never For Parameter Types London, UK Thursday, August 19th 2021, 09:21 BST In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "Never For Parameter Types" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 93. It's been quiet over the last month, so it didn't really have a chance to talk about upcoming RFCs mostly because there were none. However, PHP eight one's feature freeze has happened now, a new RFCs are being targeted for the next version of PHP eight two. Today I'm talking with Jordan LeDoux, about the Never For Parameter Types RFC, the first one targeting this upcoming PHP version. Jordan, would you please introduce yourself? Jordan LeDoux 0:50 Certainly. And thanks for having me. My name is Jordan. I've worked as a developer for about 15 years now. Most of my career has been spent working in PHP. Although professionally, I've had experience working in C#, Python, TypeScript, mostly in the form of JavaScript, but a little bit of Node and, you know, a variety of other languages that I haven't spent enough time in to really be proficient in any real way. But recently, I decided to do something that I have thought about doing for many years, but never actually jumped into which is exploring the PHP engine itself and how I could possibly contribute to it. Derick Rethans 1:32 And here we are, but your first our thing. Jordan LeDoux 1:35 Yeah, it's exciting. Derick Rethans 1:36 What is this RFC about, what does it propose? Jordan LeDoux 1:39 Well, this RFC proposes allowing the never type, which was added in 8.1 as a return value, to parameters for functions and methods on objects. The main idea behind that is that when never was proposed as a return type, it was meant to signal that the function would never return. Not that it returns void, which of course, void signifies which is returning no value or returning, returning without any specified information. And never return signifies that the function will never return, which is a concept that exists in many other languages. And for that purpose in other languages, what's usually used is something called a bottom type. And that's what never ended up being. And I'm proposing that we extend the use of that bottom type to other areas where the type may be helpful. Derick Rethans 2:38 So a bottom type, that might be a new term for many people, it will certainly for me when I looked at the never RFC for as return types. Can you sort of explain what a bottom type is, especially thinking about object oriented theory with something that we'd like to call the Liskov Substitution Principle? And also, how does it apply to argument types? Jordan LeDoux 2:59 Let's start with the Liskov Substitution. The general idea behind Liskov Substitution is that if A is a subtype of B, then anywhere that A exists, you should be able to substitute B. It has to do with when you have a class hierarchy in in an object oriented language, that that class hierarchy guarantees certain things about substitutionality, like whether or not something can be substituted for something else. That affects language design in ways that a lot of programmers are kind of intuitively familiar with, but maybe not familiar with the theory and the ideas behind it more concretely. But LSP is the principle in SOLID, that's the L and SOLID. And it represents a portion of the whole idea of object oriented programming in PHP. Part of being able to substitute one object for another, based on their class hierarchy, and what they implement, and what they provide is there part of their ability to be substituted is whether or not they can fulfil the same kind of contractual requirements of typing. And with Liskov, that means that preconditions can never be strengthened, and post conditions can never be weakened. So a precondition would be a parameter type requirement. If you require that a parameters accepts an object, for instance, in PHP, you can't strengthen that requirement beyond just any object to a particular object. But you can weaken it from an object to an object or an integer with with unions. That's an example of the precondition side of it. The post condition side of it is that you can't, you can't weaken it. So if you have have, you know, if you have a return type of int, you can't have an inherited implementation return int or float, because that broadens the possible return types. They go in opposite directions. And one of them is covariance and one of them is contravariance. Derick Rethans 5:18 I can never remember which one is which. Jordan LeDoux 5:21 Yeah, basically contravariance go up the tree and covariance go down the tree. If you're thinking about widening, or sorry, narrowing. If you're thinking about narrowing, then covariance go down the implementation tree and contravariance go up the implementation tree. Derick Rethans 5:40 Okay, so how does the bottom type fit in here? Jordan LeDoux 5:43 The bottom type in any type system represents like the base type that all types originate from. And the best way in my mind to think about it is kind of just integer math. It's a thing that every programmer is going to be familiar with. And it fits, it fits all right. So you can think of the bottom type as zero integers, a lot of people would think of null as zero if they're thinking about a type system, but null is more like negative one. It's like the entire negative side of the integer system. We could say that the string type is one, and the integer type is two. And the float type is three, and maybe int or float, the union of them is five, which would be the two numbers added together. And if you describe type systems this way, then you can say, hey, if I take any type and add the value that represents the other type, then I get my result type. So if I take zero, the bottom type, and I add one, the string type, what I end up with is one, still the string type. So the bottom type is whatever type system or whatever, whatever type, when you add any other type to it, you get the type you added to it and nothing else, you just get your original thing. That's why it's called the union identity for the type system. The top type in PHP is mixed. And that's the opposite side of it. It's just like zero is the additive identity. One is the multiplicative identity. If you multiply anything by one, you're going to get what you originally had. And if you add anything to zero, you'll get what you originally had. So mixed ends up being, or the top type in general, ends up being the intersection identity, and the bottom type, or never, in PHP's case, ends up being the union identity. And all this is like deep type theory, but most programmers don't have to interact with it. It's more something that affects language design usually. Derick Rethans 7:48 Could you think of mixed as being infinity? Jordan LeDoux 7:50 That's actually with my with my crude integer analogy, yeah, it would be like all types, all possible types are added together. Derick Rethans 7:59 That makes sense then. Okay, so we have explained what the bottom type is, but, and never being the bottom type. So why is it useful to use the bottom type, or never, as this RFC proposes, as a method argument type for parameters? Jordan LeDoux 8:14 The largest benefit has to do with what we were talking about when it comes to covariance versus contravariance. You know, can you strengthen the requirements? Or can you weaken the requirements? When you inherit a method in a system that preserves Liskov Substitution, the parameters can be widened, they can accept more things. If I had an interface that said, it has one parameter, and that parameter is typed as int, then in any implementation, I could say, okay, but actually, the parameter type is int or float, I'm going to accept both. And I could do that in the implementation. But I would have to accept int, because that's part of the contract. That's part of the interface. So I have to accept whatever type is in the interface. I can just additionally add things on top of that. If I make my original definition, my root definition, the bottom type, then I can add any type to it. And I will just get that type. From never, if I had an interface with, you know, a method foo, and it has one argument, and that argument is typed never, I can re-type that argument as int, or I could re-type that argument as string, and all of them would be valid inheritances. Derick Rethans 9:38 So it's a way of getting around having at least one concrete type like int in an interface. Jordan LeDoux 9:44 Right. And in fact, never is a concrete type. It's a concrete type, that means this code can never be called. For return type, it means this type will never return. But if you're, if what you're trying to do is call it then you can never call it. Derick Rethans 10:01 Which is then why never actually makes sense as a type name, because one of my further questions is going to be how does never as a name make sense? But you've now explained it in such a way that it actually does make sense. So there we go. Jordan LeDoux 10:15 One of the questions that did come up in the internals discussion on this so far, has been a round that choosing never, and because a lot of the examples for use cases are around inheritance, it makes some kind of intuitive sense, if you're describing the inheritance behaviour, to name it something like any or anything, or you know, something about its how permissive it is in its inheritance definition. But the type should really describe the code that it's actually written in. You know, getting outside the idea of an interface or an abstract or something like that. If you have a function, and that function types it as never, or whatever you name, the bottom type. There's no data that can satisfy that. Because any data will have some type other than never. It'll have string, or int, or something. Even null has the null type. You can't provide any data to a function that requires never that will satisfy its type requirement. So that code can never be called. It's really when you start considering how does this affect inheritance that you start getting into this concept of Oh, maybe a different word makes sense. But then that doesn't really reflect the opposite side on the return side, when you're talking about covariance instead of contravariance. And that's why the bottom type for most languages is something like never, or nothing, or nil, something along those lines. Derick Rethans 11:48 So if you type an argument to a method as never, the engine will, of course, enforce that you can't call it with any data because it wouldn't satisfy. But would it also automatically make a method abstract so that inheritance inheriting classes have to widen it or not? Jordan LeDoux 12:04 So that was part of my original idea behind the RFC, was forcing something that implements it or something that inherits it, to widen it. However, there were a lot of good arguments about why that may not be a good idea. One is that in PHP, an empty type isn't an empty type, it's mixed, not widening, would actually just be saying the type is mixed, which is valid, you can go from the bottom type to the top type in a contravariant way, that's a totally valid way to do it. The problem is that PHP has a weakly enforced typing system. Like that's only a problem in this context, it's actually a very powerful feature of the language in a lot of other contexts. So we don't necessarily want to get rid of that. In addition to that, the actual mixed type as a literal that can be used in the language was only added in 8.0. It would kind of represent a much larger backwards compatibility break to require it to be explicitly widened. That was part of my original concept for a lot of the reasons that you were just talking about. But for PHP, specifically, it would probably present more problems than it would provide kind of solutions and utility. I've kind of been convinced off that point a little bit by the arguments of others. Derick Rethans 13:22 Would it be possible to instantiate a class that has a method with its argument typed as never? Jordan LeDoux 13:28 Yes, as long as he never called that argument directly.Using a never type in a constructor would definitely be a definitely be a No, no, that would result in a type error. As soon as you try to instantiate the class. Derick Rethans 13:41 It would basically make the constructor private. Jordan LeDoux 13:43 Yeah, you would get a slightly less useful error. Derick Rethans 13:47 Yes. Jordan LeDoux 13:48 Then you do if you make the constructor private, and then try and call it. Derick Rethans 13:53 It makes no sense to do it. The RFC slightly touches on generics. And it goes in a way talking about why this is sort of slightly like generics. Could you explain the interaction between these two concepts? Jordan LeDoux 14:07 Generics is a feature obviously, that a lot of PHP developers want. And it's also a very complicated feature to do. Way outside of what I was willing to consider for, for my first, my first attempt at something useful. One of the most common ways that generics are used, is within an inheritance structure to allow something similar to type widening, particularly for parameters. Being able to say, I want the type to be able to be widened, but I don't know exactly how it will be widened. That's something that generics offer. Generics offer many other features as well and many other capabilities. But that particular one is something that this can do. It comes with a cost though, because this isn't generics. It is not really the right way to do that. It can be done without generating compile errors now, if you use never, that's the main difference. You still would encounter errors in static analysis and IDE hints, for instance. The IDE, he wouldn't be able to tell if you typed against a interface that had never as a parameter type, it wouldn't be able to tell what sorts of types the implementers have. Because that's not really the point of accepting an interface as a type for a parameter, or for a function call, or something like that. The point is that any implementer of this will be an acceptable type. But that means that from a static analysis perspective, it won't know what the type requirements are, because it won't know what concrete implementations are being provided in the code. So obviously, this is a limitation. And this limitation does not exist in a in an actual generics implementation of some kind. I do view it as an improvement personally. And the main reason is that before, if you tried to do something like this, then the errors that you generated and the problems that you caused, were in your code, in its actual execution. Now, the errors and the problems that you need to solve are going to be in your static analysis or in your IDE. And that's, that's something that's a lot safer for the code in general. It can present some maintainability challenges, it can be annoying for developers to deal with, all of that's absolutely true. And it doesn't provide all the things that you would want from being able to do that. It moves the where the safety problems are from being in code execution, where it can cause real problems into code writing, which is where you have the opportunity to kind of think about it, reason about it and catch it. Derick Rethans 16:54 From what I understand is that if you have a class doing implementing some kind of generics, then you'd often expect that where this generic type is used in either argument parameter, or return value, that'd be the same for all the methods, whereas with never, you can of course not enforce it, that it isn't being the same type being used, in all the other places where you would otherwise expect or enforce with generics to be the same type. So I reckon that's one of the differences. I think that was a better explanation that what I read from the RFC. Jordan LeDoux 17:25 That was a explanation and in argument that I wasn't forced to actually articulate until I presented it to other people, because I went through several days of research before even writing the RFC. And sometimes when you do that, particularly for you know, for programming related things, you just absorb the information and you kind of forget about what did I, which things were new, like which things that I just learned, and which things that I already know. You just integrate it all into your new programming knowledge. And so that was definitely something that when I wrote the first draft of the RFC, I didn't, didn't articulate particularly well, because I it just made sense in my head. And I had forgotten: Hey, this is something I didn't know a week ago, I should probably explain it to others. Derick Rethans 18:10 That's why we have the RFC process, right? So that other people also voice the opinion, and perhaps all the slightly confused language that make perfect sense to the author, but not necessarily to other people that read it, right? Jordan LeDoux 18:23 Yes. Derick Rethans 18:24 The RFC kindly mentioned that there's no backwards incompatible changes. So that's always good news, that makes it a little bit easy to accept. What was sort of the biggest pushback against the RFC? Jordan LeDoux 18:36 Initially, actually, the most pushback that I got, when I presented it, was a round choosing never as the name, which I thought would, in my mind, I thought would be something that was barely discussed, actually. But I mean, that again, just like you said, that's why this process exists. So that everybody can actually understand, you know, the things that go into it. I did do the research into that prior, but I couldn't find a single language that had more than one bottom type. The concept of more than one bottom type itself, if you if you go back to the integers concept, that'd be like having what positive zero and negative zero or something like that. It's a concept that just intuitively when you, when you understand how the type system itself works, you feel like okay, so there's probably something wrong from a design perspective, if you have more than one bottom type. It's very easy to not be able to see that intuitively. If you don't go in and take a really deep look at how the type system works or, or how types in general work and what they mean and stuff like that. That was the biggest pushback that I got initially. That mostly just involved explaining the things that I just said about like, why is that the case? Why does that make sense? What are some examples of other languages? Just going into that kind of information. The biggest blocker at the moment, really is about, does it make sense to provide never as a parameter type? If you can't use that statically? If you can't use it in a static analysis situation, then does it make sense to ever use it? And if it doesn't make sense to ever use it, then even if it provides contravariance, is it worth adding? And that's the main discussion that's being had right now, that's not entirely resolved. I think that there is still value in doing that. I think that that argument would carry a lot more weight with me personally, if PHP had a better way of handling the situations where that might happen. That's a much larger undertaking, which I would be interested in, but is also not really not really something that would be as kind from a backward compatibility standpoint. Derick Rethans 20:54 Which we are quite keen on. Jordan LeDoux 20:57 Right, exactly. I don't personally see a way that this type of functionality could be provided, that could satisfy that concern, and not also invalidate enormous amounts of code that currently exist. You kind of have to choose one or the other from my understanding. I will be very pleased if somebody is able to provide me a way to, even if it's a lot of work, provide me a way that that can be accomplished without breaking a lot of code. That was one of my goals is I don't want this, I want this to be an addition to what currently exists, not not something that breaks the current typing that we have, or the way that PHP developers currently interact with the type system or anything like that. That's the current discussion. And I think the thing that is most unresolved. Derick Rethans 21:46 I think you'll have a hard time trying to introduce breaking changes into the language. And I also think rightfully so. Jordan LeDoux 21:54 Yeah, it's it's a good safeguard. And it's a good principle to have in general, I think. I think the way that I described it is that this type of breaking change, the type that would be necessary, wouldn't really be like the type of breaking change you expect in a major version, it would more be like a parallel language syntax. It'd be more like rewriting PHP into a different language with very similar semantics, but very different idea of what the language means underneath. Because fundamentally, in order to do that, typing could no longer be optional anywhere. It would have to be every variable, every piece of data, every function, would have to have an explicit type that the developer is able to control and modify and mutate as they want. I don't even know if type juggling would be possible with the kind of change that would be necessary. And that's, that's half of what PHP consider PHP, so. Derick Rethans 22:55 Definitely not requiring types and places. Because I think, as I said, I think you'll have a hard time convincing people to go that way. Jordan LeDoux 23:02 Which is the reason I didn't consider going that way, really, so. Derick Rethans 23:06 Would you have anything else to add that I'm, that we missed discussing this RFC? Jordan LeDoux 23:10 This RFC is really interesting to me personally, just from an intellectual perspective. I think that for a lot of users, there's not a lot of use cases where you would use it in your own programs. If you did, it would end up being in like the very core base systems, and only in a few places. In those places, it might really shine, it might be something that's absolutely incredible. Most places in most programs you would never be using, well, you would never be using this type. It's much more critical for some of the internal features things like array access, that interface, and the typing that it requires for parameters. The typing for that's pretty broken. This could be a way that we could fix that possibly, and a couple of the other internal engine features that are implemented through interfaces and things like that would also probably be helped quite a bit by this. Derick Rethans 24:10 Thank you very much for taking the time today to talk about the Never For Parameter Types RFC. Jordan LeDoux 24:16 Thank you for having me. Derick Rethans 24:20 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of a PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Never For Parameter Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 92: First Class Callable Syntax
PHP Internals News: Episode 92: First Class Callable Syntax London, UK Thursday, July 22nd 2021, 09:20 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 92. Today I'm talking with Nikita Popov about a first class callable syntax RFC that he's proposing together with Joe Watkins. Nikita, would you please introduce yourself? Nikita Popov 0:36 Hi, Derick. I'm Nikita and I am still working at JetBrains. And still working on PHP core development. Derick Rethans 0:43 Just like about half an hour ago when we recorded an earlier episode. Nikita Popov 0:47 Exactly. Derick Rethans 0:48 This RFC has no relation to read only properties. What is the first class callable syntax RFC about? Nikita Popov 0:55 The context here is that PHP has the callable syntax based on literals, which is that if you just use a plain string, it's interpreted as a function name, and an array where the first element is an object, and the second one is a method name, that's methods. Or the first element is the class name, and the second one is method name, that's a static method. Derick Rethans 1:17 I would consider this concept a bit of a hack, especially the the one with the arrays, and I reckon you feel similar and hence this RFC? Nikita Popov 1:27 Yes, I do. So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two strings inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array element will also be renamed. But there's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that callables are not scope independent. For example, if you have a private method, then like at the point where you create your callable, like as an array, it might be callable there, but then you pass it to some other function. And that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both like this callable syntax based on arrays, and also the callable type. It's a callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable callability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable. Derick Rethans 3:01 And it's guaranteed to always be callable. Nikita Popov 3:03 Yeah, exactly. Derick Rethans 3:04 What does the syntax like? Nikita Popov 3:06 The syntax is the funny bit. As a bit of context. This proposal was created as an alternative or as a subset of the partial function application RFC. Derick Rethans 3:17 That is just as hard to pronounce as first class callable syntax RFC. Nikita Popov 3:21 Yes, that's why we say PFA. The PFA RFC has a more general feature. It also allows you to create a reference to a callable as a side effect. But more generally, it allows you to also bind some of the arguments to a fixed value. And has like finer control over for example, you can create a callable that has three required parameters, by passing three question mark arguments. While the new syntax only allows you to use the signature of the original function. But the syntax between both of those is compatible. So the new RFC is a subset of PFA. And that's why it uses the syntax where you do a normal function call, but then pass three dots or an ellipsis as arguments. Derick Rethans 4:08 Instead of passing the function's or method's normal arguments, you use the three dots. Nikita Popov 4:14 I think like the way to think about the syntax is that this is similar to like a variadic argument, or to the argument unpacking syntax, just that the arguments haven't yet been provided, they will be provided during the actual call. But I think the syntax was definitely the most contentious bit in the discussion of the RFC. I think this is mainly related to the fact that if you the see this code snippet, it looks a bit like, like the example code where the arguments haven't been filled in. While now this is like actual syntax. Derick Rethans 4:44 I'm sure there's quite a few tutorials out there explaining how PHP works by using dot dot dot. That is not something you can avoid. Nikita Popov 4:54 Well, we can avoid it, but it's fairly tricky question. I mean, the reason for this dot dot dot syntax, on one hand, this the compatibility with partial functions. I mean, the PFA, RFC has recently been declined. But in the future, we could extend the current syntax to full partial functions. And we would not end up with two different ways. So that's one benefit of the syntax. But the other part is that PHP has different symbol tables for different kinds of symbols. People often ask, why can't you just write like strlen as a plain name, not inside a string, and have that be treated as a reference to this function? And the answer to that is that we can't do that because you can't have a constant that's called strlen. Normally, that would be reference to constant and the same actually applies to all other callable types as well. So if you have something like methods, like object or method name, that would right now be interpreted as a property access. And for static methods, it will be interpreted as a as a class constant access. So we have this ambiguity here. Even if we add an additional symbol to this, for example, like for classes, we have the syntax, class name, and then scope operator class, that gives you the class name. We could do something like strlen, scope operator function, or fn, or whatever, and have that return the callable. That would work, but it also has some ambiguities. For example, if you have something like object, arrow methods, and then scope operator fn, you have this ambiguity. Is this referencing the method of that name? Or is it referencing a callable stored inside the property of that name? This is like fundamentally ambiguous. The way we would resolve it is we will just say that this index is only usable with real simple, so it will always refer to a method, and you couldn't use the syntax to convert the callable stored in a property into a proper callable. I'm actually not sure how I should distinguish these two concepts, because we have the existing callable, strings and arrays, and the first class callables, which are really closure objects. Derick Rethans 7:11 Which actually sort of brings me to the next question which just popped in my head, which is: Does this first class scalable syntax, what is returned as return a closure or an existing callable type as we have now, with a callable type being a single string, or this array syntax that we now use. Nikita Popov 7:28 The syntax returns a closure. Actually, the syntax works essentially the same way as the closureFromCallable method. And we do need to return a closure otherwise, we don't get this behaviour where the scope is bound at the time where the callable is created, rather than called. I think maybe going forward, I would generally recommend that people use a closure type, instead of a callable type in type declarations. I mean, you already cannot use callable for property types. Exactly due to this problem that callability is context dependent. While we only forbid it in property types, the same general problem also exists for argument and return types. And especially with the new syntax being introduced here, I think it's best to use closure instead of callable in the future. Derick Rethans 8:18 Does that sort of mean that first class scalable syntax is syntactic sugar? Or does it do more than the closureFromCallable method? Nikita Popov 8:27 No, I think it's effectively just syntactic sugar for closureFromCallable. Derick Rethans 8:34 I'm actually not sure whether Xdebug is be able to do anything with these closure from callable things to begin with. So that is something I'm going to have to investigate. Nikita Popov 8:44 Be able as in like, display that it actually refers to a specific method rather than just some kind of closure? Derick Rethans 8:51 Yeah, because at the moment, it shows you the file name and the line numbers, it doesn't have a name right if you create normal closures, but in this case, it's important to know that it actually refers to specific methods, which is the same thing as the closureFromCallable syntax would also do, but I've never done anything with that. Nikita Popov 9:10 But I think there is a way to get like the underlying prototype for the closure, and you should be able to determine it from there. Derick Rethans 9:18 The first class callable syntax, are there situations where you can't use it? Nikita Popov 9:22 One place where you don't want to use the new syntax as if you don't want to actually create a closure object, and validate callability at the point of creation. For example, creating this first class callable also implies that you have to autoload the class for a static method. If you have some kind of like large definition of of handler, of static handler methods for routes or something like that, then using the first class callable syntax would imply that you have to immediately create closure objects for all of these and immediately load all those classes. That's a use case where you might want to stick with the old syntax. Derick Rethans 10:01 But wouldn't opcache resolve that issue really? Nikita Popov 10:04 No, opcache is really exactly the reason why you wouldn't want to do that. For example, for my fastroute library, I cache all the data as a static array. And that's something that OpCache can cache very efficiently because it's in shared memory and accessing it is essentially zero cost. If you include something like first class callables in it, then those have to always be created at runtime, because we don't have concept like, like a persistent object. That means that this can no longer, I mean, the whole script can be in shared memory, but it still has to be executed always at runtime to construct the whole data structure. And that's going to be less efficient. To give a more clear answer to your question is that the first class callable syntax has a cost when creating the callable, and if you are in a situation where avoiding that cost is really critical for performance, that's why you wouldn't want to use it. Derick Rethans 11:00 And instead you'd have to use the old scalable syntax that we already have. Nikita Popov 11:03 Exactly. So for that reason, I think that the old syntax is not going to be removed in the near future at least, though maybe we can deprecate certain aspects of it. For example, the syntax also allows you to do highly context dependent things like referencing self, which is even worse than the situation with a private method, because self could refer to something different every time you call it. Those are some things we might want to deprecate early, but the main syntax itself was probably going to stay for a while. Derick Rethans 11:34 Because callability is checked when you create the closures does that mean it also checks for strictness then? If your PHP file has been declared with strict types? Nikita Popov 11:45 Strictness is handled the same as with closureFromCallable.The strictness is still determined at the time where the call is made, not where the callable was created, which actually, I am not a fan of how PHP handles strict types together with dynamic calls. But that's like a pre existing problem. And this isn't touching on this. Derick Rethans 12:06 The language has many issues that probably could have been done better if it was designed from scratch. But that ship has sailed 26 years ago. Nikita Popov 12:15 The strict types are not quite that old. Derick Rethans 12:17 No, that is true. The language itself is of course. Derick Rethans 12:23 Okay, thank you very much then, for taking the time this morning to talk to me about first class scalable syntax. Nikita Popov 12:29 Thanks for having me, Derick. Derick Rethans 12:30 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: First Class Callable Syntax RFC: Partial Function Application Episode #89: Partial Function Applications Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 91: is_literal
PHP Internals News: Episode 91: is_literal London, UK Thursday, July 15th 2021, 09:19 BST In this episode of "PHP Internals News" I chat with Craig Francis (Twitter, GitHub, Website), and Joe Watkins (Twitter, GitHub, Website) about the "is_literal" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 91. Today I'm talking with Craig Francis and Joe Watkins, talking about the is_literal RFC that they have been proposing. Craig, would you please introduce yourself? Craig Francis 0:34 Hi, I'm Craig Francis. I've been a PHP developer for about 20 years, doing code auditing, pentesting, training. And I'm also the co-lead for the Bristol chapter of OWASP, which is the open web application security project. Derick Rethans 0:48 Very well. And Joe, will you introduce yourself as well, please? Joe Watkins 0:51 Hi, everyone. I'm Joe, the same Joe from last time. Derick Rethans 0:56 Well, it's good to have you back, Joe, and welcome to the podcast Craig. Let's dive straight in. What is the problem that this proposal's trying to resolve? Craig Francis 1:05 So we try to address the problem where injection vulnerabilities are being introduced by developers. When they use libraries incorrectly, we will have people using the libraries, but they still introduce injection vulnerabilities because they use it incorrectly. Derick Rethans 1:17 What is this RFC proposing? Craig Francis 1:19 We're providing a function for libraries to easily check that certain strings have been written by the developer. It's an idea developed by Christoph Kern in 2016. There is a link in the video, and the Google using this to prevent injection vulnerabilities in their Java and Go libraries. It works because libraries know how to handle these data safely, typically using parameterised queries, or escaping where appropriate, but they still require certain values to be written by the developer. So for example, when using a query a database, the developer might need to write a complex WHERE clause or maybe they're using functions like datediff, round, if null, although obviously, this function could be used by developers themselves if they want to, but the primary purpose is for the library to check these values. Derick Rethans 2:05 That is a method of doing it. What is this RFC adding to PHP itself? Craig Francis 2:09 It just simply provides a function which just returns true or false if the variable is a literal, and that's basically a string that was written by the developer. It's a bit like if you did is_int or is_string, it's just a different way of just sort of saying, has this variable been written by the developer? Derick Rethans 2:28 Is that basically it? Craig Francis 2:30 That's it? Yeah. Joe Watkins 2:32 It would also return true for variables that are the result of concatenation of other variables that would pass the is literal check. Now, this differs from Google, because they introduced that at the language level, but not only at the language level, at the idiom level. So that when you open a file that's got queries in PHP, commonly, if they're long, basic concatenation is used to build the query and format it in the file so that it's readable. So that it wouldn't really be very useful if those queries that you see everywhere in stuff like PHPMyAdmin, and WordPress, and Drupal and just normal code weren't considered literal, just because they're spread over several lines with the concatenation operator. It's strictly not just stuff that's written by the programmer, but also stuff that was written by the programmer or concatenated, with other stuff that was written by the programmer. Derick Rethans 3:33 Now in the past, we have seen something about adding taint supports to PHP, right? How is this different, or perhaps similar, to taint checking? Craig Francis 3:44 At the moment today, there is a taint extension, which is something you need to go out your way to install, and actually learn about and how to use. But the main difference is that taint checking goes on the basis of say, this variable is safe or unsafe. And the problem is that it considers anything that had been through an escaping function like html_entities as safe. But of course, the problem is that escaping is difficult. And it's very easy to make mistakes with that. A classic example is if you take a value from a user, an SSH SSH, their homepage URL, if you use HTML encoding, and then put it into the href attribute of a link, that can also result in HTML injection vulnerability, because the escaping is not aware of the context which is used. Because if the evil user put in a JavaScript URL, that is in inline JavaScript, that has created a problem because taint checking would assume that because you use HTML encoding it is safe, and all I'm saying is that is it creates a false sense of security. And by stripping out all that support for escaping, it means that you can focus on libraries doing that work because they know the context, they understand the domain, and we can just keep it a much simpler, and much safer approach. Derick Rethans 5:02 Would you say that the is_literal feature is mostly aimed at library authors and not individual developers? Craig Francis 5:09 Yeah, exactly. Because the library authors know what they're doing. They're using well tested code, many eyes over it. The problem libraries have at the moment is that they trust the developer to write things themselves. And unfortunately, developers introduce a lot of injection vulnerabilities with those strings before they even get into the library. Derick Rethans 5:30 How would a library deal with with strings that aren't literal then? Craig Francis 5:35 So it really depends on each individual example. And the RFC does include quite a lot of examples of how each one will be dealt with. The classic one is, let's say you're sorting by a column in a database, because if we're dealing with SQL, the field name might come from the user. But that is also quite a risky thing to do if you start including whatever field name the user wrote. So in the RFC, I've created a very simple example where the developer would create an array of fields that you can sort by, and then whatever the user provides, you search through that array, and you pull out the one that you that matches and is fine. And therefore you are pulling out a literal and including into the SQL. To be fair, these ones are quite unique. And each one needs to be dealt with in its own way. But I've yet to find an example where you can't do it with a literal. Having said that, I think Larry Garfield actually gave an example where a content management system changed its database structure. And the way that would work is the library would have to deal with it, they would receive the value for a field, and then that field would be escaped and treated as a field, it understands it as a field, and it will process it as such, then it can include into the SQL, knowing full well that everything else in that SQL is a literal, and then it can just build up SQL in its own way internally. Derick Rethans 6:58 Okay, talking a little bit about the implementation here. Since PHP seven, we have this concept of interned strings, or maybe even before that actually, I don't quite remember. Which is pretty much a flag on each string and PHP that says, this's been created by the engine, or by coconut. Why would strings have to have an extra flag here to remember that it is created by the programmer? Joe Watkins 7:21 Well, interned does not mean literal. It's an optimization in the engine, should we use strings. We're free to do whatever we want with that. At the moment, it by happenstance, most interned strings are those written by the programmer. If you think about the sort of strings that are written by the programmer, like a class name, when those things are declared internally, by an extension, or by core code, those things are interned as if they were written by the programmer. They don't mean literal, we're free to use interned strings for whatever we want. For example, a while ago, someone suggested that we should intern keys while JSON decoding or unserializing. It didn't happen, but it could happen. And then we'd have the problem of, well, how do we separate out all this other input. There is another optimization attached to interned strings, which is one character strings, where if you type only one character, or you call a Class A or B, or whatever, the permanent interned string will be used. That results in when the chr function is called, that results in the return of that function always being marked as interned. So it would show as literal, which is not a very nice side effect. And that's just a side effect that we can see today. We don't want to reuse the string really, it does need to be distinct. Also, if you're going to concatenate, whether you do it with the VM or a specific function, obviously, you need to be able to distinguish between an interned string and a literal string, which interned means it has a specific life cycle and specific value. And we can't break that. Derick Rethans 9:00 So there are really two different concepts, is what you're saying, and hence, they need to have a special flag for that? Joe Watkins 9:06 Yeah, they're very, two very separate concepts. And we don't we don't want to restrict the future of what interned strings may be used for. We don't want to muddy the concept of a literal. Derick Rethans 9:16 Of course, any sort of mechanism that languages built into solve or prevent injections in any sort of form, there's always ways around it. Theoretically, how would you go around the is_literal checks to still get a user inputted value into something that passes the is_literal check? Craig Francis 9:36 Generally speaking, you would never need it because the library should know how to deal with every scenario anyway. And it's not that difficult. We're only talking about things like in the database world, you'll be taking value from field names and therefore it should receive field names or table names. And, you know, we are providing a guardrail as a safety net. And what should happen is that the default way in which programmers work should guide them, to do it the right way. We're not saying that you can't do weird things to intentionally work around this. A really ugly version, which you should never do, but use eval and var_export together, it's horrible. But if you are so desperate, you need to get around this. That's what we're doing it. But in reality, we can't find any examples where you'd actually need to do this. Joe Watkins 10:22 I would say that, hey, there's this idea that most people writing PHP are using libraries, and they're using frameworks. I don't actually find that to be true. I've been working in PHP for a long time. And most of the big projects I've worked on for a long time did not start out using frameworks. And they did not start out using libraries. They look a bit like that today, but their core, they are custom. There may be a framework buried in there. But there is so much code that the framework is a component and is not the main deal. Most code, we actually do write ourselves, because that's what we're paid to do. I think we don't decide how people are going to use it, and we don't decide where they're going to use it. The fact is, like Craig said, it's a guardrail that you can work around easily. And if you find a use case for doing that, then we shouldn't prejudge, and say, well, that's the wrong thing to do. It might not be the wrong thing to do. For example, an earlier version of the idea included support for integers. We considered integers safe, regardless of their source. If you wanted to do that, in your application, you could do that very easily and still retain the integrity of the guardrail is not compromised. I wouldn't focus on this is for libraries, and this is for frameworks, because these things become so small in the scheme of things that they're meaningless. I mean, most of the code we work on is code that we wrote, it is not frameworks. Derick Rethans 11:48 That also nicely answers my next question, which is what's happened to integers, which have now nicely covered. The RFC talks about that as hard to educate people to do the right thing. And that is_literal is more focused, so to say, on libraries, and perhaps query building frameworks as the RFC alludes to. But I would say that most of these query building tools or libraries already deal with escaping from input value. So why would it make sense for them to start using is_literal if you're handling most of these cases already anyway? Craig Francis 12:24 If you look at the intro of the RFC, there's a link to show examples of how libraries currently receive the strings. And you're right about the Query Builder approach is a risky thing, I would still argue it's an important part. That's why libraries still provide them. Doctrine has a nice example of DQL. The doctrine query language is an abstraction that they've created, which is also vulnerable to injection vulnerabilities. And it gives the developer a lot more control over a very basic API. I still think people should try and use the higher level API's because they do provide a nice safe default, but that depends on which library use, they're not always safe by default. So for example, when you're sort of saying: I want to find all records where field parameter one, is equal to value two, a lot of the libraries assumed that the first parameter there is safe and written by the developer. They can't just necessarily simply escape it as though it's a field because that value might be something like date, bracket, field, bracket, and it's sort of relying on the developer to write that correctly, and not make any mistakes. And that hasn't proven to be the case, you know, they do include user values in there. Derick Rethans 13:43 Just going back a little bit about some of the feedback, because feedback to the RFC has happened for quite some time now. And there were lots of different approaches first tried as well, and suggested to add additional functions and stuff like that. So what's been the major pushback to this latest iteration of the RFC? Joe Watkins 14:01 So I think the most pushback has come from an earlier suggestion that we could allow integers to be concatenated and considered literal. We experimented with that, and it is possible, but in order to make it possible, you have to disable an optimization in the engine, that would not be an acceptable implementation detail for Dmitri. It turns out we didn't actually, we don't need to track their source technically, but it made people extremely uncomfortable when we said that, and even when we got an independent security expert to comment on the RFC, and he tried to explain that it was no problem, but it was just not accepted by the general public. I'm not sure why. Derick Rethans 14:45 All right. Do you have anything to add Craig? Craig Francis 14:48 The explanation given by people is they liked the simpler definition of what that was as if it's a string written by the developer. Once you start introducing integers from any source, while it is safe, it made people feel, yeah, what is this. And that's where we also had the slight issue because we had to find a new name for it. And I did the silly thing of sort of asking for suggestions, and then bringing up a vote. And then we had, I think it's 18 to three people saying that it should be called is_trusted, and you have that sinking moment of going, Oh, this is going to cause problems, but hey, democracy. It creates that illusion that it's something more. So that's why we sort of went actually, while I like Scott's idea of having the idea of maybe calling it is_noble. It is a vague concept, which people have to understand. And it's a bit strange. Whereas going back to the simpler, original example, they've all seem to grasp grasp of that one. And we could just keep with the original name of is_literal, which I've not heard any real complaints about. Derick Rethans 15:53 I think some people were equivalenting is_trusted with something that we've had before in PHP called Safe mode, which was anything but of course. Craig Francis 16:02 Yes, no, definitely. Derick Rethans 16:03 We're sort of coming to the end of what to chat about here. Does the introduction of is literal introduce any BC breaks? Craig Francis 16:11 Only if the user land version of is_literal, which I'm fairly sure is going to be unlikely. So on dividing their own function called that. Derick Rethans 16:18 Did you check for it? Craig Francis 16:20 Yes. Derick Rethans 16:21 So if you haven't found it, then it's unlikely to to exist. Craig Francis 16:24 There are still private repositories, we can't shop through all their show, check through all their code. But yeah. Derick Rethans 16:29 Did I miss anything? Craig Francis 16:31 We covered future scope, which is the potential for a first class type, which I think would be useful for IDs and static analysers. But this is very much a secondary discussion, because that could build on things like intersection types, but we still need to focus on what the flag does. And there's also possibility of using this with the native functions themselves, but we do have to be careful with that one, because, you know, we got things like PHPMyAdmin. We have to be able to make the output from libraries as trusted because they're unlikely to still be providing a literal string at the end of it. So that's a discussion for the future. And the only other thing is that, you know, the vote ends on the 19th of July. Derick Rethans 17:08 Which is the upcoming Monday. How is the vote going? Are you confident that it will pass? Craig Francis 17:13 Not at the moment, we're sort of trying to talk to the people who voted against it. And we've not actually had any complaints as such. The only person who sort of mentioned anything was saying that we should rely on documentation and the documentation is already there. And it's not working. I think a lot of people just voted no, because they just sort of going well, that's the safe default. I don't think it's necessary. Or, you know, I'd like the status quo. And we still are trying to sell the idea and say: Look, it's really simple. It's not really having a performance impact. And it can really help libraries solve a problem, which is actually happening. Derick Rethans 17:46 Is this something that came out of the people that write PHP libraries or something that you came up with? Craig Francis 17:52 So I've come gone to the library authors and suggested you know, this is how Google do it. Would you like something similar? And we've certainly had red bean and Propel ORM saw show positive support for that. And I've also talked to Matthew Brown, who works on the Psalm static checking analysis. He's very positive about it, so much so that Psalm now also includes this as well. Obviously, static analysis is not going to be used by everyone. So we would like to bring this back to PHP so that libraries can use it without relying on all developers using static analysis. Derick Rethans 18:25 Thank you very much. Glad that you were both here to explain what this is_literal RFC is about. Craig Francis 18:31 Thank you very much, Derick. Joe Watkins 18:33 Thanks for having us. Derick Rethans 18:37 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: is_literal Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 90: Read Only Properties
PHP Internals News: Episode 90: Read Only Properties London, UK Thursday, July 8th 2021, 09:18 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Read Only Properties" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 90. Today I'm talking with Nikita Popov about the read only properties version two RFC that he's proposing. Nikita, would you please introduce yourself? Nikita Popov 0:33 Hi, Derick. I'm Nikita and I do PHP core development work by JetBrains. Derick Rethans 0:39 What does this RFC proposing? Nikita Popov 0:41 This RFC is proposing read only properties, which means that the property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have typed properties. A remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value for example. One nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP. Derick Rethans 1:35 You mean the immutable object? Nikita Popov 1:37 Sorry, immutable. And read only properties address that case. So you can simply put a public read only typed property in your class, and then it can be initialized once in the constructor and you can be... You don't have to be afraid that someone outside the class is going to modify it afterwards. That's the basic premise of this RFC. Derick Rethans 1:57 But it also means that objects of the class itself can modify that value any more, either. Nikita Popov 2:01 Exactly. So that's, I think, a primary distinction we have to make. Genuinely, there are two ways to make this read only concept work. One is like actually read only or maybe more precisely init once, which is what this RFC proposes. We can only set that once and then even in the same class, you can't modify it again. And the alternative is the asymmetric visibility approach where you say that, okay, only in the public scope, the property can only be read, but in the private scope, you can modify it. I think the distinction there is very important, because read only property tells you that it's genuinely read only, like, if you access a property multiple times in sequence, you will always get back the same value. While the asymmetric visibility only says that the public interface is read only, but internally, it could be mutated. And that might like be, you know, intentional, just that you want to like have your state management private, but that the property is not supposed to be immutable. Derick Rethans 3:05 How's this RFC different from read only properties, version one? Nikita Popov 3:09 Read only properties version one was called write once properties. I think the naming is kind of one of the more important differences. The new RFC is also effectively write once, but I think it's really important to view it from an API perspective as read only because that's what the user gets to see. While write once gives you this impression, that is know that you can externally from outside the class like passing the value once I know like dependency injection, that is what they would think of when they hear write ones. And from the technical site, there is a related difference. And that difference is that new RFC only allows you to initialize read only properties inside the class scope. That means if you do something really weird, like leaving a property uninitialized in the constructor, it's not possible for someone outside the class to initialize it instead, just like an extra safety check. Of course, you can, as usual bypass that, like if you're writing a serializer or hydrator, you can use reflection to initialize it outside the class. But normally you won't be able to. Derick Rethans 4:13 Does that mean that these read only properties can also be initialized from a normal method instead of just from the constructor? Nikita Popov 4:19 That's true. Yes, that's possible. Derick Rethans 4:21 So the RFC talks about that a read only property cannot be assigned from outside a class. Does that mean it can be set by a different object of the same class? Nikita Popov 4:29 Yeah, that's how scoping works in PHP. Scoping is always class based, not object based, it's a common misconception that if you have like private scope, you can access different objects of the same class. Derick Rethans 4:42 That was a surprise to me the first time I ran into that, but once you know, it's obvious that it's should work all over the place right. Now, the RFC states that you can only use read only with typed properties. Why is that? Nikita Popov 4:55 So this is related to the initialisation concept, that typed introduced. Typed properties start out, if you don't give them a default value, they start out in an uninitialized state. And we reuse that state for read only properties. You can only assign to the property while it's uninitialized. And once it's initialized, you cannot assigned to it any more or even unset it back to an uninitialised state. For non typed properties, you also can get into this uninitialized state by explicitly unsetting it. But the problem is that this is not the default state. Untyped properties always have no default value, even if you don't specify one, which effectively means that these properties are always initialized. So they will be kind of useless if you used read only with them. Which is why we make this distinction to avoid any confusion. And if you want to use an untyped read only property, you do that by using the mixed type, which is the same but has the initialisation semantics of typed properties. Derick Rethans 5:56 What would that mean if you have say, a resource or class typed property with a read only keyword? Can you not read or write to a resource any more? Or modify properties on an object, that is the value of a read only typed property? Nikita Popov 6:12 No, you can still modify those, because we have to distinguish the concepts of like exterior and interior mutability here. So objects and resources are... Well, I mean, we often say they are passed by reference, which is not strictly true, because those are not PHP references. But the important part is that they only pass around some kind of handle and you can still modify the inside of that handle. What you can't do is you can't reassign to a different resource or reassign to a different object, it's always the same object, the same resource, but the insides of the objects, those can change. Of course, if your object also only contains read only properties, then you won't be able to change this. Derick Rethans 6:55 Okay, and that answered another question that I have: how it possible to make the whole object read only or on all of its properties? Where the answer is by setting all the properties to read only. Nikita Popov 7:05 We could like add a read only class modifier that makes all the properties implicitly read only but maybe future scope. Derick Rethans 7:13 Is there a reason why read only properties can't have a default value? Nikita Popov 7:18 So this is, again, same issue with initialization. If they have a default value, then they're already initialized. So you can't overwrite it. Like we could allow it, but you just would never be able to change them from the default value, which is something we could allow it just wouldn't be very useful. Derick Rethans 7:36 in PHP, eight zero PHP introduced promoted or constructor promoted properties, I think it's the full name of it. How does this read only property tie in with that? Because can you set the read only flag on a constructor promoted property? Nikita Popov 7:49 Yeah, you can. So that works as expected. Important bit there is that with the promoted properties, you can set the default value. And the reason is that for promoted properties, the default value is not the default value of the property. It's the default value of the parameter. And that works just fine. Derick Rethans 8:07 But again, it would be a constant. Nikita Popov 8:10 No, in that case, it's not a constant, because it's just a default value for the parameter and the code we generate, we just assign this parameter to the property IN the constructor. Derick Rethans 8:19 Which means that if you instantiate the class with different arguments, they would of course, override the default value of the constructor argument value, right? Nikita Popov 8:28 That's right. If you pass it explicitly, then we use the explicit value. If you don't pass an argument, then we use the default of the argument, but it still gets assigned to the property through the like automatically generated code for the promotion. Derick Rethans 8:42 Promotion isn't actually... there isn't actually really a language feature, but more of a copy and paste mechanism. Nikita Popov 8:50 Yes, this is pure bit of syntax sugar. Derick Rethans 8:53 Which sometimes can be handy. But all kinds of interesting things that we added to properties or type system in general, usually inheritance comes into play. Are there any issues here with read only and inheritance? What does it do to variants or traits or things like that, all the usual things that we need to take into consideration? Nikita Popov 9:13 Most of our rules for properties, most of our inheritance rules for properties are invariant, which means that the property in the child class has to basically look the same as the property in the parent class. And this is also true for read only properties. So we say that if the parent property is read only the child property has to be read only, and the same the other direction and for the type as well. So we say that the type of the read only property has to match with the parent property. Those rules are very conservative and like they could theoretically maybe be relaxed in some cases. For read only properties on read only is like kind of return type and return types in PHP are covariant. So one could argue that the type should also be covariant here The problem is that like here, we get into this little detail that read only is really not read only, but init once. So there is this one assignment. And assignment is more like an argument. So is contravariant. So if we made them covariant, then we would basically lie about this one initializing assignment. And we don't like to lie about these things, at least without some further consideration. Derick Rethans 10:24 Read only doesn't change the variance rules at all, which is different from the property accessors RFC we spoke about a couple of months ago. Talking about that this is actually a competing RFC, or do they tie together? Nikita Popov 10:37 Well, I should first say that probably the variance stuff defined in the accessors RFC maybe isn't right, for the same reason and should also like just keep things invariant. But it's more complicated there because it also has abstract properties, or abstract accessors. And then the abstract case, that's where the different variance rules are safe. To get back to your question, they are kind of competing, but could also be seen as just complimentary. So read only is basically a small subset of the accessors feature. I mean, accessors also allow you to implement read only properties as part of a much more general framework. And read only properties cover like only this small but very useful corner in a way that's much simpler. And I think that's not just simpler from a technical perspective, but also simpler to understand for the programmer, because there are a lot of less edge cases involved. Derick Rethans 11:32 Because we're getting pretty close to feature freeze now. Are you still intending to take the property accessors RFC forward for PHP eight one? Nikita Popov 11:41 No, definitely not. Derick Rethans 11:43 But you have taken the read only properties one forwards because we're already voting on it. At the moment, it looks like the vote is going to pass quite easily. So there's that. We don't have to speculate about that, like we had to do with previous RFCs. Do you think I've missed anything talking about read only properties? Or have we covered everything? Nikita Popov 12:01 We've missed one important bit. And that's the cloning. So the read only properties RFC is not entirely uncontroversial. And basically all the controversy is related to cloning. The problem are wither methods as they're used by PSR seven, for example, which are commonly implemented by cloning the original object, and then modifying one property in it. This doesn't work with read only properties, because you know, if you clone the object, then it already has all the properties initialized. So if you try to change something, then you'll get an error that you're modifying read only property. So these patterns are pretty fundamentally incompatible. There are possible solutions to the problem primarily a dedicated syntax that goes into the code name of cloneWith where you clone an object, but override certain properties directly as part of the syntax, which could bypass the read only property checks. The contention here is basically there are some people consider this clone support and these wither methods are very important. Well, I mean, read only properties are about immutable objects. And like PSR seven is a fairly popular read only object pattern, they think that without support for cloning or for withers, the feature loses all value. That's why the alternative suggestions are either to first introduce some more cloning functionality. So like this mentioned, cloneWith or instead implement asymmetric visibility, which does not have this problem because inside the class, you can always modify the property. So it works perfectly fine with cloning. My personal view on this is well I personally use cloning in PHP approximately never. So this is not a big loss for me, and I'm happy for this from my perspective, edge case, to be addressed at a later time. I can understand that other people put more value on this then I do. Derick Rethans 13:55 There doesn't seem to be enough push against that for this RFC to fail. So there is that. Thanks Nikita for explaining the read only properties RFC which will very likely see in PHP eight one. Thanks for taking the time. Nikita Popov 14:08 Thanks for having me Derick, once again. Derick Rethans 14:13 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Read Only Properties RFC: Write Once Properties Episode #44: Write Once Properties Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 89: Partial Function Applications
PHP Internals News: Episode 89: Partial Function Applications London, UK Thursday, June 17th 2021, 09:17 BST In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Joe Watkins (Twitter, GitHub, Blog about the "Partial Function Applications" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 89. Today I'm talking with Larry Garfield and Joe Watkins about a partial function application RFC that they're proposing with Paul Crevela and Levi Morrison. Larry, would you please introduce yourself? Larry Garfield 0:36 Hello World. I'm Larry Garfield or Crell on most social medias. I'm a staff engineer for Typo3 the CMS. And I've been getting more involved in internals these days, mostly as a general nudge and project manager. Derick Rethans 0:52 And hello, Joe, would you please introduce yourself as well? Joe Watkins 0:55 Hi, I'm Joe, or Krakjoe, I do various PHP stuff. That's all there is to say about that really. Derick Rethans 1:02 I think you do quite a bit more than just a little bit. In any case, I think for this RFC, you, you wrote the implementation of it, whereas Larry, as he said, did some of the project management, I'm sure there's more to it than I've just paraphrased in a single sentence. But can one of you explain in one sentence, or if you must, maybe two or three, what partial function applications, or I hope for short, partials are? Larry Garfield 1:27 Partial function application, in the broadest sense, is taking a function that has some number of parameters, and making a new function that pre fills some of those parameters. So if you have a function that takes four parameters, or four arguments, you can produce a new function that takes two arguments. And those other two you've already provided a value for in advance. Derick Rethans 1:54 Okay, I feel we'll get into the details in a moment. But what are its main benefits of doing this? What would you use this for? Larry Garfield 2:01 Oh, there's a couple of places that you can use partial application. It is what got me interested. It's very common in functional programming. But it's also really helpful when you want to, you have a function that like, let's say, string replace takes three arguments, two of which are instructions for what to replace, and one of which is the thing in which you want to replace. If you want to reuse that a bunch of times, you could build an object and pass in constructor values and save those and then call a function. Or you can just partially apply string replace with the things to search for, and the things to replace with and get back a function that takes one argument and will do that replacement on it. And you can then reuse that over and over again. There are a lot of cases like that, usually use in combination with functions that wants a callback. And that callback takes one argument. So array map or array filter are cases where very often you want to give it a function that takes one argument, you have a function that takes three arguments, you want to fill in those first ones first, and then pass the result that only takes one argument to array map or a filter, or whatever. So that's the one of the common use cases for it. Derick Rethans 3:15 That's the benefits and some of its background comes from functional programming, as you've just mentioned. What is the syntax that you're proposing and some of the semantics? Larry Garfield 3:26 The syntax that we've developed, are two placeholders that you can use in a function call. So if you're calling a function as you normally would, but for one of the arguments, you pass a question mark, or at the tail end, you have an ellipsis (dot dot dot), then that tells the engine: This is not a function call. This is a partial application. And what it will do is return not the result of the function but return a closure object that has the the arguments that correspond to those question marks. And then when called with those arguments, we'll pass those along with the original function. Probably easier to explain, if I use a concrete example, using the string replace example we talked about before, you would call it with str_replace, the example from the RFC, hello, hi, question mark. What that gives you is a callable, a closure that has one argument, which will take its type and name from str_replace. So the third argument to str_replace essentially gets copied into that closure. And what closure does internally when you call it with that one argument is it just calls string replace with hello, hi, and whatever argument you gave it and returns that value. It is conceptually very, very similar to just writing a short lambda or an arrow function that takes one arguments and calls string replace hello, hi, and that argument. In most cases, it ends up functioning almost exactly like that. There's a few subtle differences in a few places. But most of the time, you can think of it working essentially like that. The question mark means one required argument only. The dot dot dot means zero or more arguments, if you want to, say provide the first argument to a function, and then dot dot dot would mean: And then all of the other arguments, however many there are, even if it's that zero, those are what's left, which languages other languages that have partial application as a first class feature, usually end up doing it that way where you can only pre fill from the left. PHP, because the placeholder lets us do it in any order. So we can skip over arguments if we want to, which is quite nice. But it means that you can take a function and reduce it to, I want to prefill just these two arguments and leave these three arguments for the new function, or I want to prefill these arguments from the left, and then everything else, whatever it is, is left. It also lets you do cute things like if you provide all of the arguments to a function, and then just tack on a dot dot dot the end of it, then you get back a closure that takes essentially zero arguments. But when called, will call that other function. So it's lets lets you really easily build a delayed function as you need to. Derick Rethans 6:15 When do the arguments to the function get evaluated then? Larry Garfield 6:18 Arguments are evaluated in advance. So this is the subtle difference between partial application and the short lambda syntax. In a short lambda, what happens is, essentially, that entire expression on the right hand side gets wrapped up into a closure. And so any arguments that are compound like they have a function call that is inside one of the placeholders, or one of the arguments, that'll get evaluated later. With partial application, the function that is in a parameter position gets evaluated first and reduced to a value. And that value gets partially applied to the function. 90% of the time, that's not going to be an issue. There are a few cases where doing it one way or the other may be subtly different, but you'll spot those fairly easily. Derick Rethans 7:02 So the RFC talks about things that you can do, but also a few things that you cannot do or don't want to do yet. What are these things that partials won't support, or run support yet, at least? Larry Garfield 7:13 The main thing that it doesn't support is named placeholders. You can pre fill a value or an argument with a named named argument. But not a named placeholder. Those have to be positional. Named placeholders are complicated to implement, and run into a question of, if you provide those in a different order, does that also change the order of the arguments in the partially applied function that you get back in that closure? And there's a good argument to be made that either way is logical. And so we're like, no, does not deal with it, too complicated. We'll just positional only. And you cannot specify an optional arguments either. It's just again, too complicated. Things get too weird. If you have those advanced cases, use our short lambda, that works just fine. If you want to just make a new function that defers to a new function, and change its API in the process, short lambda works fine. And it's still quite short. Derick Rethans 8:13 I know the RFC talks a little bit about references, but I don't like talking about references. So let's skip that part. In my opinion, they should be removed from the language. But I know we can't. Larry Garfield 8:22 There's occasionally used for them. But very occasionally. Derick Rethans 8:25 There's a bunch of technical things that I also want to chat about. And hopefully, Joe, if you want to fill in, I'd be more than welcome to hear your opinions on these things. But the first one is that PHP has this thing called func_get_args. How does that work with these partials? How does that tie in together? Joe Watkins 8:42 It should mostly behave as if you've invoked the function directly. We don't want there to be a huge discrepancy between. The callee know whether they've been called through partial application or complete application. It should be the same. Derick Rethans 8:58 That is good to know. I mean, I always like it how things work as people expect them to work, right? Joe Watkins 9:03 Yeah. Derick Rethans 9:04 We already have used the dot dot dot operator for variadics. But you're reusing the dot dot dot, or ellipses, as you more eloquently call it earlier. Here again, as well, is that not going to cause issues? Or does that tie in well together? Joe Watkins 9:18 Well, there's quite a lot of debate about what's the right symbol to use. I think it's dot dot dot, and I think Larry agrees with me. But there's some people who want to stick an extra question mark on the end, which to me looks like it reads zero to one. And to Larry, it looks like an extra character that's just not needed. Other people say it makes sense for them. But if you can type three characters and not four, I mean, you need a really good argument. The arguments that have been put forward so far don't really make very much sense for me. Maybe we should ask that question and it doesn't really matter. In the end, what the syntax is, is if it's a difference between it getting in and not getting in, then we'll just put the extra question mark on there. I don't really have a really good argument to change it like to be like that. Derick Rethans 10:05 To be honest, to me, it looks like you then have two placeholders. Joe Watkins 10:09 Yeah. Derick Rethans 10:10 I don't feel the need for it. Joe Watkins 10:11 That's also another argument because we've introduced this one symbol, and then this other symbol, and then you put them together. And that's two things. I mean, you can't have one and one equals one. Derick Rethans 10:20 Fair enough. The RFC does touch on another quite interesting thing, I think, which is constructors, which it also be able to partially apply. But of course, you've mentioned that, that arguments get applied immediately when you do the substitution, when you do the partial application. But of course, the constructor is a bit weird because a constructor runs immediately after an object has been constructed. So how does that work together with partials? Joe Watkins 10:47 So at first, we made it so like if you invoke a constructor with reflection, and you just invoke it over and over again, it'll invoke it on the same object, you won't get back a new object. It's not the constructor that returns the object, it's the new operator. So first, we had a bit dumb. And we did just like what reflection does. And if you applied to a constructor, you'd get back a closure that just repeatedly invokes the constructor, which is, as Larry called it, quite naive. So we went back and revisited that. And so now it acts like a factory. Every time you invoke the closure return from an application, you get a brand new object, which is more in line with what people expect. And it's also quite cool. It's one of my favourite bits, actually as it turns out. Derick Rethans 11:31 In my opinion, it also makes more sense than then having an apply to the same object over and over again. Whether I'd like it or not, I don't know yet. Joe Watkins 11:39 Oh, the other option is traditional constructors to avoid the surprising behaviour. But that would be just a strange. Larry Garfield 11:45 There are a lot of use cases where you want to take a bunch of values, convert them to objects using an array map, supporting constructors for that makes total sense to me. Derick Rethans 11:54 And I would probably say, though, that I would prefer not allowing it over it applying over the same object over again. You've touched a little bit on some common cases where you want to use this, do you perhaps have some other ideas where this might be really useful? Larry Garfield 12:10 So there's three use cases that we think are probably going to be the lion's share. One is to just use the dot dot dot operator. So you have some function or method call, call it with dot dot dot, and that's it. You prefill nothing, which gives you back a closure that is identical in signature to the function or the method that you're applying it to. Everything we've said about functions applies the methods here as well. Which means we now effectively have a new way to refer to a function or a method and make a callable out of it, that doesn't involve just sticking it into a string. You just say, hey, function called dot dot dot, or an arrow bar, parentheses, dot dot dot, parentheses. And now you can turn any function or method into a callable and pass that around. And it's still, it's not wrapped up into the silly array format, it's still accessible to static analysers and refactoring tools. Hopefully, with this, you will never need to refer to a function name using a string ever again, never refer to a method call as an array of object and method. So that that just is not needed any more in the vast majority of cases. Derick Rethans 13:20 That alone is probably worth having them, maybe. Larry Garfield 13:23 And Nikita had an RFC that was doing just that, and nothing else. It's kind of a junior version of this. I don't think that's necessary, the full full scope here works, and gives us that. The second use case that I think is going to be common are unary functions. That's functions that take a single argument. More to the point, as I mentioned before, a lot of functions take a callback. And that callback needs a single argument, array map, array filter, some validation routines, a lot of other things like that. So it's now stupidly easy to take any arbitrary function or method and turn it into a single parameter function, which you can then pass as a callback to array map, array filter, all these other tools, and it just becomes really easy to pre fill things that way. The third is the other one I mentioned earlier, if you pre fill all the arguments, and then just put a dot dot dot at the very end, which means zero or more, you now have a function that takes no arguments, but calls the original function you specified with all the arguments you specified. This often the case for default values, where I want to have a default value available, but don't want to take the time to compute it in advance because it might be expensive. Whatever function it is that will determine that default value, I just partially apply that and give it all the arguments and I get back a callable. That creating a callable is dirt cheap, but when I actually need that value, I can then call it at that time, but it won't actually get called unless I need it. That's another use case that we expect to be common. There are no doubt others that we haven't thought of, or that will be less common, but still useful. I think this will probably replace a large chunk of the use cases for short lambdas. Not because short lambdas are bad, they're wonderful. But so many of them convert a function to a simpler function. And this gives us an even more compact, more readable syntax for that, with even less extra symbols and flotsam around it. Derick Rethans 15:24 I saw, hopefully as a joke, saying that, instead of using the question mark, we should use dollar sign dollar sign, and then we should call the token name T_BLING. Larry Garfield 15:36 This RFC actually has a storied history. Several years ago, Sara Golemon had proposed porting the pipe operator from Hack to PHP. The pipe operator is an operator available in a lot of different languages that lets you string together a series of functions. So you pass a function, pass an argument into one function, its results you pass to the next function, its results, you pass the next function and so on, which is a good case for unary functions. In Hack's syntax, they don't use a function on the right hand side, they use an arbitrary expression, and then dollar dollar as a placeholder for where to put the value from the left hand side from the previous step. It's the only language that does that. Derick Rethans 16:20 The other language that does it is bison. Larry Garfield 16:23 Or Bison also does that style of? Derick Rethans 16:25 It does something weird like that, yeah. Have a look at the grammar file. Larry Garfield 16:29 I've looked in there. It's scary. So at the time, she didn't actually put an implementation in for it. But there was some discussion about it. I joked that if she wanted to do that, she should call it T_BLING. And she thought it was hilarious, but never went anywhere. A year ago, I started working on a pipe operator RFC that did just the pipe part, but used a callable on the right hand side, instead of an expression, more like F#, and Haskell, and other languages that have a pipe operator. And their main response to that was, we'd like this, this is cool, except that just using short lambdas on the right all the time to make unaries is too ugly. We want partial application first. So I spent a while trying to bribe someone with more experience and knowledge than me to work on partial application. I tried bribing Ilya Tovolo, to do so by working with him on enumerations. And we got enumerations in, but he doesn't have the time to work on partial application. Levi and Paul had already written an RFC for partial application that had no implementation. It's just a skunkworks, essentially. Then a few weeks ago, Joe pops up and starts working on an implementation for partials. And I, to this day, don't know what interested him in it. But I'm very happy about this fact. So as we updated the RFC, I knew that people want a bike shed about syntax. So I threw that in as a joke. I don't think we're actually going to do that. It's just a little inside reference that is now no longer inside. Derick Rethans 17:56 Joe what made you work on partials, then? Joe Watkins 17:58 It's interesting to write. I've had my fun whether it gets in or not. Derick Rethans 18:02 Sometimes that's the case, right? So just working on this is all the fun. Larry Garfield 18:06 Sometimes it's fun to just run down rabbit holes for the heck of it. And sometimes really cool things can come out of that sometimes. Derick Rethans 18:12 At some point, I might have to implement support for partials like I have for closures in Xdebug as well. Because at some point, people might want to debug these things. So I'm a little bit interested in how do these the closures that it generates? Where does it store the already applied arguments? Joe Watkins 18:29 So partials have the same binary struct up to this point or of the closure, and then after that there's some extra fields. Derick Rethans 18:36 Would they still have the names? Joe Watkins 18:39 No, because named arguments aren't actually named, that information is lost. By the time we've got them, we don't have any name information. We've only got their correct position, according to the call that was made. Derick Rethans 18:50 And every argument that hasn't been filled and doesn't have a special placeholder in there, or does it keep track of which ones have been filled in? Joe Watkins 18:56 We've got two special placeholders internally, you won't see as undef or null or anything. Derick Rethans 19:02 Okay, that's good to know. What has the reaction been so far? Larry Garfield 19:05 Slightly positive. There were a lot of discussions early on about do we support argument reordering? And should it use a single placeholder or two separate placeholders? Originally, we had one and realized after a while, that doesn't actually work. There're use cases where that will be confusing. Overall, the feedback has been quite positive, and I fully expect that to pass. Really the only question people are still debating about at this point is ellipsis versus ellipses question mark. Joe Watkins 19:34 Yeah, I think the first version of the RFC was quite well received. Someone said we could document it as to make a partial sprinkle or question mark over it and hope for the best. Derick Rethans 19:44 Oh, that's good to hear. With feature freeze coming not pretty soon now. When do you think you're putting this up for a vote? Larry Garfield 19:51 Probably in the next couple of days. The only question I think is whether we include a second question for which variadic placeholder to use, which syntax/ Or if we just say it's dot dot dot, go away. Other than that it should go to a vote probably before this episode airs. Derick Rethans 20:06 Thank you very much, both of you for taking the time to me today to talk about partials. Larry Garfield 20:11 Thank you again Derick, hopefully see you once more on this season. Joe Watkins 20:15 Thanks Derick, see you soon. Derick Rethans 20:21 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Partial Function Applications Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0