This is the 'PHP Internals News' podcast, where we discuss the latest PHP news, implementations, and issues with PHP internals developers and other guests.

PHP Internals News: Episode 73: Enumerations

January 28, 2021 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 73: Enumerations London, UK Thursday, January 28th 2021, 09:01 GMT In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a new RFC that he is proposing together with Ilija Tovilo: Enumerations. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick and welcome to PHP internals news that podcast dedicated to explain the latest developments in the PHP language. Derick Rethans 0:22 This is Episode 73. Today I'm talking with Larry Garfield, who you might recognize from hits such as object ergonomics and short functions. Larry has worked together with Ilija Tovilo on an RFC titled enumerations, and I hope that Larry will explain to me what this is all about. Larry, would you please introduce yourself? Larry Garfield 0:43 Hello World, I'm Larry Garfield, I am director of developer experience at platform.sh. We're a continuous deployment cloud hosting company. I've been in and around PHP for 20, some odd years now. And mostly as an annoying gadfly and pedant. Derick Rethans 1:00 Well you say that but in the last few years you've been working together with other people on several RFCs right, so you're not really sitting as a fly on the wall any more, you're being actively participating now, which is why I end up talking to you now which is quite good isn't it. Larry Garfield 1:15 I'm not sure if the causal relationship is in that direction. Derick Rethans 1:18 In any case we are talking about enumerations or enums today. What are enumerations or enums? Larry Garfield 1:26 Enumerations or enums are a feature of a lot of programming languages, what they look like varies a lot depending on the language, but the basic concept is creating a type that has a fixed finite set of possible values. The classic example is Boolean; a Boolean is a type that has two and only two possible values: true and false. Enumerations are way to let you define your own types like that to say this type has two values, sort ascending or descending. This type has four values, for the four different card suits in a standard card deck, or a user can be in one of four states: pending, approved, cancelled, or active. And so those are the four possible values that this variable type can have. And what that looks like varies widely depending on the language. In a language like C or c++, it's just a thin layer on top of integer constants, which means they get compiled away to integers at compile time and they don't actually do all that much, they're a little bit to help for reading. At the other end of the spectrum, you have languages like rust or Swift, where enumerations are a robust Advanced Data Type, and data construct of their own. That also supports algebraic data types, we'll get into that a bit more later. And is a core part of how a lot of the system actually works in practice, and a lot of other languages are somewhere in the middle. Our goal with this RFC, is to give PHP more towards the advanced end of enumerations, because there are perfectly good use cases for it so let's not cheap out on it. Derick Rethans 3:14 What is the syntax? Larry Garfield 3:15 Syntax we're proposing is tied into the fact that enumerations as we're implementing them, are a layer on top of objects, they are internally objects with some limitations on them to make them enumeration like. The syntax is if I can do this verbally: enum, curly brace, a list of case foo statements, so case ascending, case descending, case hearts, case spades, case clubs, space diamonds, and so on. And then possibly a list of methods, and then the closed curly brace. So it looks an awful lot like a class because internally, the way that we're implementing them, the enum type itself is a class, it compiles down to a class inside the engine, but instead of being able to instantiate your own instances of it. There are a fixed number of instances that are pre created. And those are the only instances that are ever allowed to exist. And those are assigned to constants on that class, and you have the user status, then you have a user status enum, which has a case approved, or active, and therefore there is in the engine, a class user status which has a constant on it, public constant named approved, whose value is an instance of that object. And then you can use that like a constant, pretty much anywhere and it will just work. The advantage of that over just using an object like we're used to, is you can now type a parameter, or a property or return type for: This is going to return a user status, which means you're guaranteed by the syntax that what you get back from that method is going to be one of those four value objects, each of which are Singleton's so they will always identity compare against each other. Larry Garfield 5:07 So just like if you type something as Boolean, you know, the only cases you ever have to consider are true and false. If you type something as user status, you know, there are only four possible cases you ever have to think about, and you know exactly what they mean and you can document them, and you don't need to worry about what if someone passes in, you know, a string that says cancelled instead of rejected or something like that. That's syntactically impossible. Derick Rethans 5:32 What happens if you do that? Larry Garfield 5:33 You get a syntax error. If for example, a method is typed as having a parameter that takes a user status and you pass in the string rejected. Well, it's a string but what's expected is a user status object. So it's going to type error. Derick Rethans 5:47 You get a type error yes. Larry Garfield 5:49 If you try and pass in user status, double colon rejected, when that's not a type, you get an error because there is no constant on that class named rejected therefore, you'll get that error instead. Building on top of objects means an awful lot of functionality, kind of happens automatically. And the error checking you want to kind of happens automatically. And it's in ways that you're already used to working with variables in PHP. Derick Rethans 6:14 For example, it also means that enums will be able to be auto loaded because they're pretty much a class? Larry Garfield 6:19 Exactly. They autoload exactly the same way as classes, you don't you didn't even do anything different with them just follow the standard PSR four you're already following, and they'll auto load like everything else. Derick Rethans 6:28 That also means that class names and enum names, share the same namespace then? You can't have an enum with the same name as a class. Larry Garfield 6:36 Right. Again, making it built on classes means all of these. So it's kind of like a class, the answer is yes, almost always, and it makes it easier to work with is more convenient, it's like you're used to. Derick Rethans 6:47 I suppose it's also makes it easier to implement? Larry Garfield 6:50 Substantially. Derick Rethans 6:51 I'm familiar with enums in C, or c++ where enums is basically just a way of creating a list of integers, that's auto increase right, then I've been reading the RFC, it doesn't seem to do anything like that. How does this work, are the constants on the enums classes, are they sort of backed by static values as well? Larry Garfield 7:12 They can be. There's some subtlety here that's one of the last little bits we're still ironing out as we record this, the user status active constant is backed by an object that is an instance of user status and has a hard coded name property on it, called active. That is public but read only. That's how internally in the engine, it keeps track of which one is which. And there is only ever that one instance of user status with the name active on it. There is another instance that has name cancelled or name pending or whatever. Derick Rethans 7:53 There's no reason to be more than one instance of these at all. Larry Garfield 7:58 Then you can look at that name property but that's really just for debugging purposes, most of the time you'll just use user status double colon active user colon user status colon approved, whatever as a constant and, roll, roll with that. The values themselves don't have any intrinsic primitive backing, as we've been calling it. You can optionally specify that enumeration values of this type, are going to have an integer equivalent, or a string equivalent. And those have to be unique and you have to specify them explicitly. The syntax for that would be enum user status colon string, curly brace, blah blah blah. And then each case is case active equals. Derick Rethans 8:46 Is it required that the colon string part is part of your enum declaration? Larry Garfield 8:51 If you wanted to have a primitive backing, yes, a scalar equivalent. We're actually calling its scalar enums at the moment because that's an easier word to say than primitive. The idea here is enumerations themselves conceptually should not just be an alias for a scalar, they have a meaning conceptually unto themselves rather than just being an alias to an integer. That said, there are plenty of cases where you want to be able to convert enumeration case/enumeration value, into a primitive to store it in a database, to show it on a web page, where I can't store the object in a database that doesn't really work well, MySQL doesn't like that, I added this ability to define a scalar equivalent of a primitive. And then, if you do that, you get two extra pieces that help you. One is a value property, that is again a public read only property, which will be that value, so if approved, or active is a, then user status double colon active arrow value will give you the value A. Which means if you have an enum and you want to save it to a database table, you just grab the value property and that's what you write to the database, and then it's just a string or integer or whatever it is you decide it to make that. When you then load it back up, there is a from method. So you can say user status double colon from. If it's some primitive, and it will return the appropriate object Singleton object for that type so user status double colon active. Derick Rethans 10:28 You mentioned that enums are quite like classes, does that also mean, you can define methods on the enums? Larry Garfield 10:35 Enums can have methods on them, both normal objects methods and static methods. They can also take interfaces, and then they have to conform to that interface like any other class. They can also use traits, as long as the traits do not have any properties defined, as long as they are method only traits. We are explicitly disallowing properties, because properties are a way to track and maintain and change state. The whole point of an enumeration is that the enumeration value with the case is a completely defined instance of into itself and user status active is always equal to user status active. Derick Rethans 11:16 And you can't do that, if there are properties. Larry Garfield 11:19 Right. You don't want the properties of user status to change dynamically throughout the course of the script that's not a thing. Derick Rethans 11:25 Yeah, what would you use methods for on enums? Larry Garfield 11:29 There's a couple of use cases for dynamic methods or object methods, the most common would be a label method. As an example, if you have a user status enumeration, the example we keep going back to. You can have a method for label. Give each of them a make it a scalar enum, so they all have a string or an integer equivalent that you can save for a database, then you give them all a label method. A which in this case would be a single label method that has a match statement inside it and just switches based on the, the enumeration, expect matching enumeration fit together perfectly they're, they're made for each other, Derick Rethans 12:07 And also match will also tell you if you have forgotten about one of the cases. Larry Garfield 12:12 Ilija started working on match in PHP 8.0, he knew he wanted to do enumerations, I came in later. This is a long plan that he's been working on I've joined him. You have a label method that matches on the enumeration value, which is dollar this inside the method will refer to the enumeration object you're on, just like any other, and then return a human friendly label for each of those. You can then call a static method called cases, which will return an array of the enumeration cases you have, you'll loop over that and read the value property, and you read the label, and you stick that into a select box or checkboxes or whatever, you know template system you have. There you have a way to attach additional static data to a enumeration without throwing on more stateful properties. That's the most common case, there are others. You can have a enumeration method that returns a specific other enumeration case. Like if you're on enumeration you say what's next, you're creating a series of steps like a junior very basic state machine. And you call next and it'll return, another enumeration object on the same type for whatever the next one is, or whatever else you can think of them. I'm sure there are other use cases I'm not thinking of. For static methods, the main use case there is as an alternate constructor. So if you want to say you have a size enumeration small, medium, large, you have a from length static method, you pass it an integer, and it will map that integer to the small, medium or large enumeration case based on whatever set of rules how you want to define which size is which. Probably the most common case for our static method is that kind of named constructor, essentially, maybe others, cool, come up with some. Derick Rethans 14:19 During our compensation you mentioned two functions: cases and from, where do these methods come from? Larry Garfield 14:26 cases is defined on all enumerations period, it just comes with it. You are not allowed to define your own, and it returns. It's a static method, and it returns a list of the instances for that that enum type from the static method that is generated automatically on scalar enum so if there's a scalar backing, then this from method gets automatically generated, you cannot define it yourself it's an error to do so. Derick Rethans 14:55 Is it an error, even if it isn't a scalar enum, to define the from method? Larry Garfield 15:01 At the moment I don't believe so you can put a from on a non scalar enum, like unit enum. We recommend against it. Maybe we should block that just for completeness sake, I'll have to talk to Ilia about that. There's a few edge cases like that we're still dealing with. Nikita has a suggestion for around error handling with from, for what happens if you pass an invalid scalar to from, we're not quite sure what the error handling of that is. It may involve having an extra method as well. In that case, details like that we're still working on as we speak, but we're at that level of shaving off the sharp corners, the bulk of the spec is pretty well set at this point. Derick Rethans 15:45 When I read the RFC it mentioned somewhere that adding enumerations to PHP is part of a larger effort to introduce something called algebraic data types. Can you explain what these are, and, and which future steps and proposals would additionally be needed to support these there? Larry Garfield 16:05 We're deliberately approaching this as a multi part RFC. The larger goal here is to improve PHP's ability to make invalid states, unrepresentable, which is not a term that I have coined I've read it elsewhere. I think I've read it multiple elsewhere, but its basic idea is you use the type system to make it impossible to describe states that are allowed by your business rules, and therefore, you don't need to write error handling for them. You don't need to write tests for them. Enumerations are a form of that, where you don't need to write a test for what happens when you pass user status joking, to a method, because that's a type error, that's already covered, you don't need to think about it any more. Derick Rethans 16:53 Would that be a type error or would you get an error for accessing a constant, that doesn't exist on the enum if you say user status joking? Larry Garfield 17:02 I think you'd actually get an error on a constant that doesn't exist in that case. If you just pass a string, an invalid string to the method and you would get a type error on string doesn't match user status, and the general idea being you structure your code, such that certain error conditions, physically can't happen. Which means you don't need to worry about those error conditions, it's especially useful where you have two things that can occur together but only in certain combinations, my favourite example here, if you're defining an airlock. You have an inner door and an outer door. The case of the indoor and outdoor both being open at the same time is bad, you don't want that. So you define your possible states of the airlock using an enumeration to say: inner door closed out other door closed, is one state is one enumeration case; inner open outer closed as one state, and inner closed outer open is one state. Those are the only three possible values that you can define so you cannot even describe the situation of both doors are open and everyone dies. Therefore, you cannot accidentally end up in that state. That's the idea behind making invalid states unrepresentable. Algebraic data types are kind of an extension of the idea of enumerations further by adding the ability for the cases, to have data associated with them. So they're no longer singleton's. There's a number of different ways that you can implement that different languages do it in different ways. Most of them, bake them into enumerations somehow for one reason or another, a few use the idea of sealed classes, which are a class, which is allowed to be extended by only these three or four or five other classes, and those are the only subclasses you're allowed to have. They're conceptually very similar, we're probably going to go with enumeration based, but there's still debate about that, there's no implementation yet. The idea with ADTs as we're envisioning them is, you can define, like, like you can have an enum case that is just its own thing. You can have an enum case that's backed by a scalar, or you can have an enum case, that has data values associated with it. And then it's not a singleton, can spawn new instances of it. But then those instances are always immutable. The most common example of that that people talk about would be: You can now, implement, monads very easily. I'm not going to go into the whole, what is a monad here. Derick Rethans 19:36 It's a word that I've heard many times but have no idea what it means. Larry Garfield 19:40 You should read my book I explained it wonderfully. But it allows you to say, the return value of this method is either nothing, or there is something and that something is this thing. An enumeration, where the two possible types are nothing, and something. And the something has parametrised with the thing that it actually is carrying. That's one use case. Another use case Rust actually in their documentation is a great example of this. If you're building a game, then the various moves that a player can take: turn left, turn right, move forward, move backward, shoot, back up, those are a finite set of possible moves so you make that an enumeration. And then those are the only moves possible in the code. But some of them, like, move forward by how much. By how much is a parametrised value on the move forward enum case. And you can then write your code to say alright, these are the five possible actions a player can take; these three have no extra parameters, this one has an extra parameter. And I don't need to think about anything else because nothing else is can even be defined. So that's the idea behind ADTs, or algebraic data types. There are way to do the kind of things you can do now just in a much more compact and guaranteed fashion, you can do that same thing now with subclasses, but you can't guarantee that no one else is adding another subclass you have to think about. With an ADT you can guarantee that there's no other cases and there will be no other cases. Larry Garfield 21:17 Another, add on that we're looking at is pattern matching. The idea here is match right now, the match construct in PHP eight, do an identity match. So match variable against triple equals, various possibilities, and that covers a ton of use cases it's wonderful. I love match. However, it would be also very useful to say match this array, let's say an associative array, against a case where the Foo property is bar, I don't care about the rest. If the case where the foo property is beep. And I don't care about the rest I'm gonna do stuff with the rest on the right side of match. With enumerations, that comes in with ADTs, where I want to match against: Is it a move left match against move left and then I want to extract that how far is the right hand side and do something with that. If it's a move right on extract the how far is that on the right. If it's a turn left. Well that doesn't have an associated property so I can match that and do whatever with that. It's a way to deconstruct some value, partially, and then take an action based on just part of that object. That's a completely separate RFC, that, again, we think would make enumerations, even more powerful, and make it a lesson number of other things more powerful of make the match statement, a lot more robust, all the pieces of that we're looking at is looking at match against type, so you can do an instance of or an isint or whatever check as part of the match. Again this is all still future RFC nothing has actually been coded for this yet, it's in the, we would like to do this next if someone lets us Derick Rethans 23:12 Coming back to this RFC. What has the feedback been to it so far? Larry Garfield 23:17 Very positive. I don't think anyone has pushed back on the idea. Or earlier draft used a class per case, rather than object per case, which a couple of people push back on it being too complicated so we simplified that to the object per case. And at this point, I think we're just fighting over the little details like: Does that from method return null, or does it throw? What does it throw? Do we have a second method that does the opposite? Exactly what does reflection look like? We have an isenum function or do we have an enum exists function? We're at that level. I expect this RFC is probably going to pass with like 95%. At this point, something like that. It's not a controversial concept, it's just making sure all the little details are sorted out, the T's are crossed, and the i's are dotted and so on. Derick Rethans 24:09 Did we miss anything in the discussion about enums, do you have anything to add? Larry Garfield 24:15 I don't think so, at least as far as functionality is concerned. To two other things process wise: one, I encourage people to look at this multi stage approach for PHP. I know I've talked about this on, on this podcast before; having a larger plan that you can work on in pieces and then multiple features that fit together nicely to be more than the sum of their parts. It's a very good thing, we should be doing more of that, and collaborating on RFC so that small targeted functionality adds up to even more functionality when you combine them, which is what we're looking to do with enums, and ADTs, and pattern matching for example. Credit where it's due, Ilija has done all of the code for this patch. I'm doing design and project management on this, but actually making it work is 100% Ilija he deserves all the credit for that, not me. Derick Rethans 25:06 Thank you, Larry for explaining enums to me today. I learned quite a lot. Larry Garfield 25:11 Thank you. See you on a future episode, hopefully. Derick Rethans 25:16 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Enumerations Episode #51: Object Ergonomics Episode #69: Short Functions Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 72: PHP 8.0 Celebrations!

November 26, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 72: PHP 8.0 Celebrations! London, UK Thursday, November 26th 2020, 09:35 GMT In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 8.0. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:23 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:32 This is Episode 72. PHP eight is going to be released today, November 26. In this episode, we look back across the season to find out which new features are in PHP eight dot zero. If I have spoken with the instigator of each of these features, I'm letting them explain what this new feature is. in the first episode of this current year, I spoke with Nikita Popov about weak maps, a feature that builds on top of the weak references that were introduced in PHP seven four. I asked: What's wrong with the weak references and why do we now need to weak maps. Nikita Popov 1:10 There's nothing wrong with references. This is a reminder, what weak references are about, they allow you to reference, an object, without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference, and if you try to access it you will get acknowledged sort of the object. Now the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with some typical use cases are caches or other memoize data structures. And the reason why it's important for this to be weak, is that you do not. Well, if you want to cache some data with the object and then nobody else is using that object, you don't really want to keep around that cache data, because no one is ever going to use it again, and it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well. Derick Rethans 2:29 A main use case for weak maps will likely be ORMs and related tools. In the next, episode 38, I discussed this trainable interface with Nicolas Grekas from Symfony fame. I asked: Nicolas, could you explain what stringable is. Nicolas Grekas 2:45 Hello, and stringable is an interface that people could use to declare that they implement some the magic to string method. Derick Rethans 2:53 That was a short and sweet answer, but a reason for wanting to introduce this new interface was much more complicated, and are also potential issues with breaking backwards compatibility. Nikolas replied to my questioning about that with: Nicolas Grekas 3:06 That's another goal of the RFC; the way I've designed it is that I think the actual current code should be able to express the type right now using annotations, of course. So, what I mean is that the interface, the proposal, the stringable is very easily polyfilled, so we just create this interface in the global namespace that declare the method and done, so we can do that now, we can improve the typing's now. And then in the future, we'll be able to turn that into an actual union type. Derick Rethans 3:39 I had a chat with Nikita Popov about union types as part of the last season in Episode 33 before PHP seven four was released, but after the feature freeze for it happened. I happen to speak with Nikita quite a lot because he does so much work improving PHP. In episodes 40 and 43, we discussed a bunch of smaller features and tweaks to the language. First up was the static return type, Nikita explains: Nikita Popov 4:05 So PHP has a three magic special class names that's self for into the current class parent, refering to the parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved than static is the same as self, introducing refers to the current class. However, if the method is inherited. And you call this method on the child class. Then, self is still going to refer to the original class, the parent. Well static is going to refer to the class on which the method was actually called. Derick Rethans 4:50 Next, most the class name literal on objects, which adds the following: Nikita Popov 4:55 And that just returns to the fully qualified class name, for example, have a use statement for that class, you can get back the full name instead of the short name. I think we've had this since PHP five five. That's a great feature because it's like makes it clear whether you're referencing the class and not just some random string, and that means, for example, that the IDE refactorings can work better and so on. Derick Rethans 5:20 In PHP seven, we touched up inconsistencies in PHP is compound variable syntax, but we missed a few cases, Nikita explains: Nikita Popov 5:28 All of these remaining consistencies are like really really minor things and edge cases. But weirdly, all or at least most of them are something that someone that at some point ran into and either open the bug or wrote me an email, or on Twitter. So people somehow managed to still run into these things. Derick Rethans 5:56 But syntax changes are not the only thing that needs fixing; sometimes we also get the theory slightly wrong. In this case related to inconsistencies with traits, Nikita explains again: Nikita Popov 6:07 The problem is that traits are sometimes not self contained. So to give a specific example we have in the logger PSR. We have a trait called logger treat, which has a bunch of methods like: warning, error, info, notice, and so on. So we just simple helper methods which all call the log method with a specific log level, and this trait only specified these helper methods, but still requires the actual class to implement the log method that we usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait and to have a number of abstract methods that the trait itself requires to work. This already works fine. The problem is just that these methods are not actually validated, or they are all inconsistently validated. Even though the trait specifies this abstract method, you could implement it in the class with a completely different signature. Derick Rethans 7:09 Probably one of the big new features and PHP 8.0 are attributes, certainly the most discussed new feature with various RFCs to introduce a feature, and then change the syntax a few times. It all started with Benjamin Eberlei introducing a feature in April, ran Episode 47 I asked Benjamin, what attributes are. He replied: Benjamin Eberlei 7:30 They are a way to declare structured metadata on declarations of the language. So in PHP, or in my RFC, this would be classes, class properties, class constants, and regular functions. You could declare additional metadata there that sort of tags, those declarations with specific additional machine readable information. Derick Rethans 7:56 At the moment, many tools are already used up block comments to do a similar thing. So I asked how attributes are different. Benjamin answered with his first proposed syntax: Benjamin Eberlei 8:06 The idea is that we introduce a new syntax that is independent of the docblock comments. Essentially, before each declaration, you can use the lesser-than symbol twice, then the attribute declaration, and then the greater-than sign twice. This is the syntax, I've used from the previous attributes RFC. Dmitri at that point, use the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction anymore, but because Hack at that point they introduced that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with that kind of sort of free symbols. We can still use at certain places, and lesser-than and greater-than at this point are easy to parse. There are a bunch of alternatives, and one thing that I would probably proposes an alternative syntax, where we start with a percentage sign, then a square bracket open, and then a square bracket close. It is more in line with our Rust declares attributes, by Rust uses the sort of the hash symbol which we can't use because it's a comment in PHP. Derick Rethans 9:23 He already hinted at alternatives for syntax and we'll get back to that in a moment. However, the main important thing is what was inside the surrounding attribute notation, and Benjamin explains again: Benjamin Eberlei 9:35 If you declare an attribute name, and then you sort of have a parenthesis, open parenthesis close to pass optional arguments. You don't have to use them so you can only use the attribute name. If you sort of want to tag something just, this is a validator, this is an event listener, or whatever you come up with to use attributes for. But if you need to configure something in addition, then you can use the syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it. Derick Rethans 10:10 In the rest of that episode we spoke about how to use attributes and what the general ideas behind them were. Soon after the attributes RFC was accepted, Benjamin proposed a second related RFC to tweak some of the working that came up throughout the discussion phase. At the time of the original RFC he did not want to make any changes in order to keep the discussion focused, but there were some tweaks necessary. In Episode 64 Benjamin explains what these changes were: Benjamin Eberlei 10:38 There was renaming the attribute class. So, the class that is used to mark an attribute from PHPAttribute to just Attribute. I guess we go into detail in a few seconds but I just list them. The second one is an alternative syntax to group attributes and yeah save a little bit on the characters to type and allow to group them. The third was a way to validate which declarations an attribute is allowed to be set on, and the fourth was a way to configure if an attribute is allowed to be declared once or multiple times, on one declaration. Derick Rethans 11:23 All the suggested tweaks passed with ease. A contentious issue, however, was the syntax to enclose the attributes. Two RFCs later, the PHP development team finally settled on the syntax which has attributes enclosed in hash, square bracket open, and close with the square bracket. Derick Rethans 11:44 George Peter Banyard likes tidying up things in the language. I spoke with him on several occasions this season, where he was suggesting to just do that. In the first instance he's suggesting to change PHP to use locale independent floating point numbers to string conversions. He explains the problem: George Peter Banyard 12:02 Currently, when you do a float to string conversion. So or casting or displaying a float, the conversion will depend on like the current locale. So instead of always using like the decimal dot separator. For example, if you have like a German or the French locale enabled, it will use like a comma to separate like the decimals. Derick Rethans 12:23 He explained what he suggested to change: George Peter Banyard 12:26 Change more or less to always make the conversion from float to string, the same so locale independent, so it always uses the dot decimal separator. With te exception of printf was like the F modifier, because that one is, as previously said locale aware, and it's explicitly said so. Derick Rethans 12:46 The second RFC was candidly titled saner numeric strings. I asked George, what the scope of the problem he wanted to address is. George Peter Banyard 12:55 PHP has the concept of numeric strings, which are strings which have like integers or floats encoded as a string. Mostly that would arrive when you have like a GET request or a POST request and you take like the value of the form, which would be in a string. And the issue is that PHP makes some kind of weird distinctions, and classifies numeric strings in three different categories mainly. So there are purely numeric strings, which are pure integers or pure float, which can have an optional leading whitespace and no trailing whitespace. However trailing white spaces are not part of the numeric string specification in the PHP language. To deal with that PHP has a concept of leading numeric strings, which are strings which are numeric but ... Derick Rethans 13:42 As you can hear the way how PHP handles these numbers in strings is extremely complicated. The fix is just as complicated, but it pretty much boils down to stop treating strings like "5elephant" as a number. In the last episode, I briefly discuss Larry Garfield's object ergonomics article where he sets out a more coherent way forwards into thinking on how to solve some more of the bigger pain points of PHP. Most related to value objects. Although he did not end up proposing any RFCs himself, Nikita Popov did take some inspiration from it. And he proposed two features for inclusion into PHP eight: constructor property promotion, and named arguments. In Episode 53 Nikita explains with constructor property promotion intends to solve. Nikita Popov 14:29 Right now, if we take a simple example from the RFC, we have a class Point, which has three properties, x y and z. And each of those has a float type. And that's really all the class is ideally, this is all we would have to write. But of course, to make this object actually usable we also have to provide a constructor. And the constructor is going to repeat that, yes, we want to accept three floating point numbers, x y and z as parameters. And then in the body we have to again repeat that. Okay, each of those parameters needs to be assigned to a property. So we have to write this x equals x, this y equals y, this z equals z. I think for the Point class. This is still not a particularly large burden. Because we have like only three properties. The names are nice and short, the types are really short, and we don't have to write a lot of code. But if you have larger classes with more properties, with more constructor arguments, with larger and more descriptive names, and also larger and more descriptive type names. And this makes up for quite a bit of boilerplate code. Derick Rethans 15:52 I asked: What is the syntax that you're proposing to improve this? Nikita Popov 15:57 The syntax is to merge the constructor and the property declarations, so you declare the constructor, and you add an extra visibility keyword in front of the normal parameter name. So instead of accepting float x, in the constructor. You accept public float x. And what this shorthand syntax does is to also generate the corresponding property. So you're declaring a property, public float x, and to also implicitly perform this assignment in the constructor body so to assign this x equals x. This is really all it does so it's just syntactic sugar. It's a simple syntactic transformation that we're doing, but that reduces the amount of boilerplate code you have to write for value objects in particular, because for those commonly, you don't really need much more than the properties and the constructor. Derick Rethans 16:58 Tying in the constructor property promotion was a slightly more controversial RFC, named arguments, which I discussed with Nikita in Episode 59. I asked him what named arguments are: Nikita Popov 17:09 Currently if you're calling a function or a method you have to pass the arguments in a certain order. So in the same order in which they were declared in the function or method declaration. And what named arguments are, and parameters allow us to do, is to instead specify the argument names, when doing the call. Just taking the first example from the RFC, we have the array_fill function, and the array_fill function accepts three arguments. So you can call like array_fill(0, 100, 50). Now, like what what does that actually mean. This function signature is not really great because you can't really tell what the meaning of this parameter is and, in which order you should be passing them. So with named parameters, the same code would be something like array_fill( start: 0, number: 100, value: 50). And that should immediately make this call, much more understandable, because you know what the arguments mean. And this is really one of the main like motivations or benefits of having named parameters. Derick Rethans 18:21 We also briefly touched on the main issues where the introduction of named arguments could introduce backward compatibility issues. Nikita Popov 18:28 If you don't use named arguments that nothing is going to break. But of course, if named arguments are used with codes that did not expect them, then we can run into some issues. So that's one of the issues. And the other one is more of a like long term maintenance concern, that if we introduce named parameters, then those parameters become significant to the API. Which means you cannot rename parameter names in minor versions of libraries if you're semver compatible. Of course, you might be breaking some codes, using those parameter names. And I think one of the biggest concerns that has come up in the discussion is that this is a significant increase in the API burden for open source libraries. Derick Rethans 19:15 Beyond the main features that we've discussed so far. PHP eight also outs a few smaller ones. For example, the non capturing catches, which I discussed with Max Semenik in Episode 58. He explains his short proposal: Max Semenik 19:29 In current PHP, you have to specify a variable for exceptions you catch, even if you don't need to use this variable in your code. And I'm proposing to change it to allow people to just specify an exception type. Derick Rethans 19:48 This proposal password 48 votes for, and one against. The last few major PHP releases a lot of focus was put into strengthening PHP's type system. You see that and PHP seven four with additions to OO variance rules, and in PHP eight already with union types, the stringable interface and a static return type. Dan Ackroyd explains in episode 56, why he was suggesting to ask the predefined union type "mixed". Dan Ackroyd 20:14 I have a library for validating parameters, and due to how that library needs to work the code passes user data around a lot. Internally, and then back out to whether libraries return the validator's result. So I was upgrading that library to PHP 7.4, and that version introduced property types, which are very useful things. What I was finding was that I was going through the code, trying to add types everywhere occurred. And as a significant number of places where I just couldn't add a type, because my code was holding user data. It could be any other type. The mixed type had been discussed before, an idea that people kind of had been kicking around but it just never been really worked on. So that was the motivation for me, I was having this problem where I couldn't upgrade my library, as I wanted to, I kept forgetting: has this bit of code here, been upgraded and I just can't add a type, or is it the case that I haven't touched this bit of code yet. Derick Rethans 21:16 When I spoke with Dan, he also mentioned that sometimes he assists with writing RFCs in case some person would benefit from some technical editing, for example, due to language barriers. In the same way I ended up speaking to Dan again in Episode 65 about a null safe operator, on which he was working with Ilija Tovilo. That explains what a feature is about. Dan Ackroyd 21:38 Imagine you've got a variable that's either going to be an object, or it could be null, so the variable is an object, you're going to want to call a method on it, which obviously if it's null, then you can't call method on it because it gives an error. Instead, what the null safe approach allows you to do is to handle those two different cases in a single line, rather than having to wrap everything with if statements to handle the possibility that it's null. The way it does this is through a thing called short circuiting, so instead of evaluating whole expression. As soon as use the null safe operator, I want the left hand side of the operator is null, and then get short circuited and it all just evalutates to null instead. Derick Rethans 22:18 It also gets an additional benefit related to having shorter code. Dan Ackroyd 22:24 And having the information about what the code's doing in the code, rather than in people's heads makes a lot easier for compilers and static analyzers to their jobs. Derick Rethans 22:35 The last big nice syntax feature in PHP eight zero is the match expression, again by Ilija Tovilo. Instead of me interviewing Dan again, I decided as a joke to interview myself on the subject. It was a little bit surreal but I think it worked out well enough, as a one off event. I first explained to myself the problem with the existing switch language construct. Derick Rethans 22:56 So, before we talk about the match expression, we really need to talk about switch. Switch is a language construct in PHP that you probably know allows you to jump to different cases depending on the value. So you have to switch statement: switch, parentheses opening, variable name, parenthesis closes, and then for each of the things that you want to match against your use case condition, and that condition can be either static value or an expression. But switch has a bunch of different issues that are not always great. So the first thing is that it matches with the equals operator, or the equals, equals sign. And this operator as you probably know, will ignore types, causing interesting issue sometimes when you're doing matching with variables that contain strings with cases that contains numbers, or a combination of numbers and strings. So, if you do switch on the string foo, and one of the cases has case zero, then it will still be matched because it could type juggle the foo to zero, and that is of course not particularly useful. At the end of every case statement you need to use break, otherwise it falls down to the case that follows. Now sometimes that is something that you want to do, but in many other cases that is something that you don't want to do and you need to always use break. If you forget, then some weird things will happen sometimes. Anothercommon thing to use it switches that we sit on on a variable. And then, what you really want to do is the result of, depending on which case's being matched assign a value to a variable. And the current way how any student now is case, say case zero, $result equals string one, break, and you have case two where you don't set: return value equals string two and so on and so on. Which isn't always a very nice way of doing it because you keep repeating the assignment, all the time. And another but minor issue with switch is that it is okay not to cover every value with a condition. So, it's totally okay to have case statements, and then not have a condition for a specific type and switch doesn't require you to add default at the end either, so you can actually end up having a condition that would never match any case, and you have no idea that that would happen. Derick Rethans 25:16 Before I went into details, I also explained how the new match language construct could solve some of these criticisms. Derick Rethans 25:24 The match expression is a new language keyword, which also allows you to switch depending on a condition matching a variable. You're saying matching this variable against a set of expressions just like you would do with switch. But there's a few major differences with switch here. Unlike switch, match returns a value, meaning that you can do return value equals match, then your variable that you're matching on, and the value that gets assigned to this variable is the result of the expression on the right hand side of each condition. Derick Rethans 26:04 That's it for the new features in PHP eight, but I haven't spoken yet about a reason why the PHP team is releasing PHP eight and not PHP seven dot five. And of course, that is PHP's new JIT engine that is slated to improve performance, quite a lot. I have some concerns of my own. And in Episode 48 I spoke with Sara Goleman, which articulated my main concerns with it more eloquently. Sara Golemon 26:31 If you go and look at the engine, particularly the runtime pieces of the engine, although the compiler's complex as well. You have to do a lot of digging before you even get to a point that you can see how the pieces maybe start to fit together. You and I have spent enough time in the engine code that we know where to look for a particular thing like let's say that opcode you mentioned that implements strlen. We know that Zend VM def dot h has got the definition for that. We also know that that file is not real code, it's a pre processed version of code that gets built later on. Somebody coming to that blind is not going to see a lot of those pieces. So there's already this big ramp up just to get into the Zend engine, as it exists now in 7.4. Let's add JIT on top of that, you've got code that is doing call forward graphs and single static analysis and finding these tracelets, and making sense of the code at a higher level than a single instruction at a time, and then distilling that down to instructions that the CPU is going to recognize, and CPU instructions are these packed complex things that deal with immediates and indirects, and indirects of indirects, and registers, and the x86 call API is a ridiculous thing that nobody should ever have to look at. So you add all this complexity to it, that by the way, sits in ext/opcache, it's all isolated to this one extension, that reaches into the engine and fiddles around with things to make all this JIT magic happen and we're going to take your reduced set of developers who know how to work on Zend engine and you're going to reduce that further. I think at the moment it's still only about three or four people who actually understand how PHP's JIT is put together enough that they can do any effective work on it. Derick Rethans 28:20 I am still a little apprehensive about whether the effort of introducing a JIT engine is going to pay off. I certainly hope that I'm going to be proven wrong and that the JIT engine is going to be a massive performance boost. But in the end, we do definitely need more people to understand and work on the PHP engine and a new JOT engine that is built into opcache. Perhaps that's something you yourself might want to have a look at in 2021. PHP 8 will be out later today and I hope that you're pleased with all the new features that a PHP development worked on hard throughout the year. With this I'm concluding this episode and also this year's season. I will be back in the new year with more episodes where I hope to demystify the development of the PHP engine some more. Enjoy the holidays and stay safe. Derick Rethans 29:08 Thank you for listening to this installment of PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next year. Show Notes Episode #33: Union Types Episode #38: WeakMaps Episode #39: Stringable Interface Episode #40: Static Return Type, Class Name Literal on Object, and Variable Syntax Tweaks Episode #43: Validation for abstract trait methods Episode #47: Attributes v2 Episode #48: PHP 8.0 and JIT Episode #52: Locale-independent float to string cast Episode #53: Constructor Property Promotion Episode #54: Magic Method Signatures Episode #59: Named Arguments Episode #64: Attribute Amendments, Shorter Attributes Syntax, and Shorter Attributes Syntax Change Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 71: What didn’t make it into PHP 8.0?

November 19, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 71: What didn’t make it into PHP 8.0? London, UK Thursday, November 19th 2020, 09:34 GMT In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 7.4, but did not end up making the cut. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 71. At the end of last year, I collected snippets from episodes about all the features that did not make it into PHP seven dot four, and I'm doing the same this time around. So welcome to this year's 'Which things were proposed to be included into PHP 8.0, but didn't make it. In Episode 41, I spoke with Stephen Wade about his two array RFC, a feature you wanted to add to PHP to scratch an itch. In his own words: Steven Wade 0:52 This is a feature that I've, I've kind of wish I would have been in the language for years, and talking with a few people who encouraged. It's kind of like the rule of starting a user group right, if there's not one and you have the desire, then you're the person to do it. A few people encouraged to say well why don't you go out and write it? So I've spent the last two years kind of trying to work up the courage or research it enough or make sure I write the RFC the proper way. And then also actually have the time to commit to writing it, and following up with any of the discussions as well. Steven Wade 1:20 I want to introduce a new magic method the as he said the name of the RFC is the double underscore to array. And so the idea is that you can cast an object, if your class implements this method, just like it would toString; if you cast it manually, to array then that method will be called if it's implemented, or as, as I said in the RFC, array functions will can can automatically cast that if you're not using strict types. Derick Rethans 1:44 I questioned him on potential negative feedback about the RFC, because it suggested to add a new metric method. He answered: Steven Wade 1:53 Beauty of PHP is in its simplicity. And so, adding more and more interfaces, kind of expands class declarations enforcement's, and in my opinion can lead to a lot of clutter. So I think PHP is already very magical, and the precedent has been set to add more magic to it with seven four with the introduction of serialize and unserialize magic methods. And so for me it's just kind of a, it's a tool. I don't think that it's necessarily a bad thing or a good thing it's just another option for the developer to use Derick Rethans 2:21 The RFC was not voted on and a feature henceforth did not make it into PHP eight zero. Derick Rethans 2:27 Operator overloading is a topic that has come up several times over the last 20 years that PHP has been around as even an extension that implements is in the PECL repository. Jan Bøhmer proposed to include user space based operator overloading for PHP eight dot zero. I asked him about a specific use cases: Jan Böhmer 2:46 Higher mathematical objects like complex numbers vectors, something like tensors, maybe something like the string component of Symfony, you can simply concatenate this string object with a normal string using the concat operator and doesn't have to use a function to cause this. Most basically this should behave, similar to a basic string variable or not, like, something completely different. Derick Rethans 3:16 For some issues raised during the RFC process and Jan explains to the most notable criticisms. Jan Böhmer 3:21 First of all, there are some principles of operator overloading in general. So there's also criticism that it could be used for doing some very weird things with operator overloading. There was mentioned C++ where the shift left shift operator is used for outputting a string to the console. Or you could do whatever you want inside this handler so if somebody would want to save files, or modify a file in inside an operator overloading wouldn't be possible. It's, in most cases, function will be more clear what it does. Derick Rethans 4:01 He also explained his main use case: Jan Böhmer 4:04 Operator overloading should, in my opinion, only be used for things that are related to math, or creating custom types that behave similar to build types. Derick Rethans 4:15 In the end, the operator overloading RFC was voted on. But ultimately declined, although there was a slim majority for it. Derick Rethans 4:24 In Episode 44, I spoke with Máté Kocsis about the right round properties RFC and asked him what the concept behind them was. He explained: Máté Kocsis 4:33 Write once properties can only be initialized, but not modified afterwards. So you can either define a default value for them, or assign them a value, but you can't modify them later, so any other attempts to modify, unset, increment, or decrement them would cause an exception to be thrown. Basically this RFC would bring Java's final properties, or C#'s read only properties to PHP. However, contrary to how these languages work, this RFC would allow lazy initialization, it means that these properties don't necessarily have to be initialized until the object construction ends, so you can do that later in the object's lifecycle. Derick Rethans 5:22 Write once properties was not the only concept that he had explored before writing this RFC. We discussed these in the same episode: Máté Kocsis 5:31 The first one was to follow Java and C# and require all right, once properties to be initialized until the object construction ends, and this is what we talked about before. The counter arguments were that it's not easy to implement in PHP, the approach is unnecessarily strict. The other possibility is to let unlimited writes to these properties, until object construction ends and then do not allow any writes, but positive effect of this solution is that it plays well with bigger class hierarchies, where possibly multiple constructors are involved, but it still has the same problems as the previous approach. And finally the property accessors could be an alternative to write once properties. Although, in my opinion, these two features are not really related to each other, but some say that property accessors could alone, prevent some unintended changes from the outside, and they say that maybe it might be enough. I don't share this sentiment. So, in my opinion, unintended changes can come from the inside, so from the private or protected scope, and it's really easy to circumvent visibility rules in PHP. There are quite some possibilities. That's why it's a good way to protect our invariance. Derick Rethans 7:02 In the end this RFC was the client, as it did not wait to two thirds majority required with an even split between the proponents and the opponents. Derick Rethans 7:11 Following on from Máté's proposal to add functionality to our object orientation syntax. I spoken Episode 49 with Jakob Givoni on a suggested addition COPA, or in full: contact object property assignments Jakob explains why he was suggesting to add this. Jakob Givoni 7:28 As always possible for a long time why PHP didn't have object literals, and I looked into it, and I saw that it was not for lack of trying. Eventually I decided to give it a go with a different approach. The basic problem is simply to be able to construct, populate, and send an object in one single expression in a block, also called inline. It can be like an alternative to an associative array: you give the data, a well defined structure, the signature of the data is all documented in the class. Derick Rethans 8:01 Of course people abuse associative arrays for these things at the moment, right. Why are you particularly interested in addressing this deficiency as you see it? Jakob Givoni 8:11 Well I think it's a common task. It's something I've been missing as I said inline objects, obviously literals for a long time and I think it's a lot of people have been looking for something like this. And also it seemed like it was an opportunity that seemed to be an fairly simple grasp. Derick Rethans 8:28 I also asked them what the main use case for this was. Jakob Givoni 8:32 Briefly, as I mentioned, they're data transfer objects, value objects, those simple associative arrays that are sometimes used as argument backs to constructors when you create objects. Some people have given some examples where they would like to use this to dispatch events or commands to some different handlers. And whenever you want to create, populate, and and use the object in one go, COPA should help you. Derick Rethans 9:04 COPA did also not make it into PHP eight with the RFC being the client nearly unanimously. The proposals by both Máté and Jakob where meant to improve PHP object syntax by helping out with common tasks. The implementation ideas of what they were trying to accomplish were not particularly lined up. This spurred on Larry Garfield to write a blog post titled: object ergonomics, which are discussed with him in Episode 51. I first asked him why he wrote this article: Larry Garfield 9:33 As you said, there's been a lot of discussion around improving PHP's general user experience of working with objects in PHP, where there's definitely room for improvement, no question. And I found a lot of these to be useful in their own right, but also very narrow, and narrow in ways that solve the immediate problem, but could get in the way of solving larger problems later on down the line. I went into this with an attitude of: Okay, we can kind of piecemeal attack certain parts of the problem space, or we can take a step back and look at the big picture and say: All right, here's all of the pain points we have, what can we do that would solve, not just this one pain point, but let us solve multiple pain points with a single change, or these two changes together solve this other pain point as well, or, you know, how can we do this in a way that is not going to interfere with later development that we talked about. We know we want to do, but hasn't been done yet. Are we not paint ourselves into a corner by thinking too narrow. Derick Rethans 10:40 The article mentions many different categories and possible solutions. I can't really sum these up in this episode because it would be too long. Although, Larry did not end up proposing RFC based on this article, it can be called responsible for constructor property promotions, which I discussed with Nikita Popov in Episode 53 and Named Arguments which are discussed with Nikita in Episode 59. Both of these made it into PHP 8.zero and cover some of the same functionality that Jakob's COPA RFC covered. I will touch on the new features that did make it into PHP 8.0 in next week's episode. There are two more episodes where discuss features that did not make it into PHP eight zero, but these are still under discussion and hence might make it into next year's PHP eight dot one. In Episode 57, I spoke with Ralph Schindler about his conditional code flow statements RFC. After the introduction, I asked what he specifically was wanting to introduce. Ralph Schindler 11:36 This is, you know, it's, it's very closely related to what in computer science is called a guard clause. And I used that phrase lightly when I originally brought it up on the mailing list but it's very close in line to that it's not necessarily exactly that, in terms of the syntax. In terms of like when you speak about it in the PHP code sense, it really is sort of a change in the statement. So putting the return before the if, that's really what it is. So a guard clause, it's important to know what that is is it's a way to interrupt the flow of control Derick Rethans 12:08 Syntax proposals are fairly controversial, and I asked Ralph about his opinions of the type of feedback that he received. Ralph Schindler 12:15 The smallest changes always get the most feedback, because there's such a wide audience for a change like this. Derick Rethans 12:23 The last feature that did not make it into PHP eight zero was property write/set visibility, which I discussed with André Rømcke in Episode 63. I asked him what his RFC was all about: Derick Rethans 12:34 What is the main problem that you're wanting to solve with what this RFC proposes? André Rømcke 12:40 The high level use case is in order to let people, somehow, define that their property should not be writable. This is many benefits in, when you go API's in order to say that yeah this property should be readable. But I don't want anyone else but myself to write it. And then you have different forms of this, you have either the immutable case where you, ideally would like to only specify that it's only written to in constructor, maybe unset in destructor, maybe dealt with in clone and so on, but besides that, it's not writable. I'm not going into that yet, but I'm kind of, I was at least trying to lay the foundation for it by allowing the visibility or the access rights to be asynchoronus, which I think is a building block from moving forward with immutability, read only, and potentially also accessors but even, but that's a special case. Derick Rethans 13:39 At the time of our discussion he already realized that it would be likely postponed to PHP eight dot one as it was close to feature freeze, and the RFC wasn't fully thought out yet. I suspect we'll hear more about it in 2021. With this I would like to conclude this whirlwind tour of things that were proposed but did not make it in. Next week I'll be back with all the stuff that was added to PHP for the PHP eight zero celebrations. Stay tuned. Derick Rethans 14:09 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes Episode #41: __toArray() Episode #42: PECL overload Episode #44: Write Once Properties Episode #49: COPA Episode #51: Object Ergonomics Episode #57: Conditional Codeflow Statements Episode #63: Property Write/Set Visibility Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 70: Explicit Octal Literal

November 12, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 70: Explicit Octal Literal London, UK Thursday, November 12th 2020, 09:33 GMT In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed to add an Explicit Octal Literal to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:24 This is Episode 70. Today I'm talking with George Peter Banyard, about a new RFC that he's just proposed for PHP 8.1, which is titled explicit octal literal. Hello George, would you please introduce yourself? George Peter Banyard 0:38 Hello Derick, I'm George Peter Banyard, I'm a student at Imperial College London, and I contribute to PHP in my free time. Derick Rethans 0:46 Excellent, and the contribution that you're currently have up is titled: explicit octal literal. What is the problem that this is trying to solve? George Peter Banyard 0:56 Currently in PHP, we have four types of integer literals. So decimal numbers, hexadecimal, binary, and octal. Decimal is just your normal decimal numbers; hexadecimal starts with 0x, and then hexadecimal characters so, null to nine and A to F, and then binary starts with 0b, and then it's only zeros and ones. However, octal notation is just a decimal, something which looks like a decimal number, which was a leading zero, which doesn't really look that much different than a decimal number, but it comes from the days from C and everything which just uses like a zero as a prefix. Derick Rethans 1:48 But I have seen is people using like array keys for the, for the month names right and they use 01, 02, 03, you get 07, and 08 and 09, and then they look at the arrays. They notice that they actually had the zeroth element in there but no, but no eight or nine. That's something that is that PHP no longer does I believe. No, it's mostly that the parser doesn't pick it up anymore. Instead of silently ignoring the eight, it'll just give you an error. You've mentioned that there's these four types of numbers with octal being the one started with zero. But what's the problem with is that a moment? George Peter Banyard 2:31 Sometimes when you want to use, which looks like decimal number. So, for example, you're trying to order months, and use like the full two digits for the month number, instead of just one, you use 01, as an array key. When you get to array, it will parse error because it can't pass 08 as an octal number, which is very confusing, because it. Most people don't deal with octal numbers that often, and you would expect everything to be decimal. Because numeric strings are always decimal, but not integers literals. So, the proposal is to add an explicit octal notation, which would be 0o. So python does that, JavaScript has it, Rust also has it, to allow like a by more explicit to say oh I'm dealing with an octal number here. This is intended. Derick Rethans 3:33 Beyond having the 0b for binary, and the 0x for hexadecimal, the addition of 0o for octal is the plan to add. And is that it? George Peter Banyard 3:45 That's more or less the proposal. It's non-BC, because the parser before would just parse or if you had 0o, so there's no PC very possible numeric strings are not affected because since PHP 7.0 hexadecimal strings are not handled anymore as numeric strings. Numeric strings will always be decimal integers, literals will have your four different variants, and maybe a future proposal is to deprecate the implicit octal notation to always make a decimal, even if you have leading zeros. Derick Rethans 4:21 At the moment, if I do as a string literal 014, and do an echo that I get 12. George Peter Banyard 4:27 Because then it's interpreted as an octal. The most bizarre example is if you do var_dump string of 014 double equal to 014, you will get false, because one is interpreted as well 14, like the numeric string is interpreted as 14, whereas the octal number, which says 014 as an integer literal is interpreted as an octal number, which is 12, which is slightly confusing for most people, because that also if you because PHP, most, we all deal with like HTTP requests, and I GET and POST a data, which everything is in strings because it's a text protocol. And if you get user output, which is like I don't know, naught 14 and you're, are you intending to compare munz numbers which are or. 01201, and then you get to array, well then you just fail. Derick Rethans 5:22 Of course, removing that support means a BC breaking change, which phones happen until PHP nine, of course, which might be a while away from now let's say that. George Peter Banyard 5:31 Probably five years, if we're going through the timelines from PHP seven to PHP 8, but to be able to deprecated and remove it. Well, you need to add support for something else. So that's more the long term plan. Derick Rethans 5:46 And your proposal is basically to make it equivalent to binary and hexadecimal numbers, so that it is less confusing in general. George Peter Banyard 5:55 Yes, that's why the RFC is very short. Derick Rethans 5:58 What are octal numbers actually used for? George Peter Banyard 6:02 The only practical use case that I've seen is for Linux permissions, so chmod. Execute read and write, are those who permissions which chmod will use an octal number. Derick Rethans 6:15 In a different order though but George Peter Banyard 6:17 Yes, I don't know chmod though on top of my heart. Derick Rethans 6:20 Is it only Linux permissions that you can think of? Is there anything else? I can't either so I'm asking you. George Peter Banyard 6:25 No, I can't. That's why I find it very odd that like the leading zero just makes it octal instead of anything else. I mean it has precedence because many other languages do that like C, Java, I don't know, many any language I suppose was just picked it up from. I think C. But when I looked into the history, weirdly enough before C. They had a prefix for like binary, octal, and dec, and hexadecimal. But then the one for octal just got dropped, for some reason. Derick Rethans 6:57 Maybe because the zero and the "o" next each other look very the same. We've already touched on whether there are BC breaks or not, BC standing for backwards compatibility. And, there shouldn't be any because it's something that a parser currently doesn't understand. But do other build-in extensions need to be modified for example? George Peter Banyard 7:18 We have two extensions, which one which deals with numbers, so which is GMP, which is arbitrary precision arithmetic. And then there's the filter extension to filter octal, which filters data and tells you if it's valid or not and it gives you back a, like a correct integer or something like that, which is the filter extension, which has an octal filter. Both of these extensions have been modified to support like the prefix notation, and interpreted as a valid octal number. And then we have like the function which is oct2dec, which is basically octal to decimal, which which weirdly enough already supported like the octal prefix. Derick Rethans 7:59 But that accepts strings, I suppose? George Peter Banyard 8:01 Yes that that accepts strings. Derick Rethans 8:04 And it already supported the 0o prefix? George Peter Banyard 8:07 Yes, which is very on point for PHP I feel. Some things are just supported randomly in one side but not everywhere else. Derick Rethans 8:15 It's a surprise for me that is what I can say. So, yeah, you mentioned as a short RFC, you think there will be any extensions to this in the future? You already mentioned having it maybe deprecating the current just zero prefix? George Peter Banyard 8:31 So one other possible future scope is with the prefix to reintroduce octal, binary, and hexadecimal numbers. As with the prefixes as numeric strings. If you type, 0xAABBCC in, and you have that as a string, which could be useful if you get like colorus back from, from a webform, that would be automatically converted into an integer, or not automatically converted if you do like if you compare it to numbers, or if you cast it to an integer, because currently if you get 0x, something and you cast it to an integer, you will get zero. So that way you need to use like a function like hex2dec, or oct2dec, or bin2dec to convert from a string, or to another string and then cast that. Or it may be cast directly to an integer, I'm not exactly sure. But that's also debatable if it's something we want to add. Derick Rethans 9:37 Is it actually possible to do, for example with hexadecimal numbers, do like if you have inside a string. Can you do xAA, does that actually work? George Peter Banyard 9:48 I didn't think so. Derick Rethans 9:49 That actually works. You can do var_dump("x6A") and it gives you the letter J. George Peter Banyard 9:55 The more, you know. Derick Rethans 9:56 But it doesn't work for binary, or octal. Only for hexadecimal with x. So I guess that's something that could be added to string interpolation at some point. George Peter Banyard 10:07 PHP is so weird sometimes. Derick Rethans 10:10 Yes, I mean PHP does things in its own way, however, making this kind of small changes to it, just end up improving the language step by step and that is of course the way forward. Right. George Peter Banyard 10:23 Yeah. Derick Rethans 10:25 And I'm looking forward to more of these small incremental changes in the future as well. George Peter Banyard 10:30 Seems like a good plan. Derick Rethans 10:32 Are you planning any more? George Peter Banyard 10:34 Well, so I went through some of the old RFCs, most notably the one about when the whole scalar type thing was going on. We had like strict types and then we had like the coercive types. One which was by Dmitri, Zeev, pretty sure Stas, and um forgot, forgetting somebody else. But some of them, some of the ideas they had, which was making some of the type juggling more strict, so float to integer conversions. Currently, even if the floating number has like decimal part, it will just truncate it to an integer, and it won't emit any warning and it will just like pass without any issue, I think that may be is kind of unexpected. I made the other warning to that to possibly make it a type error in the future. Derick Rethans 11:24 You mean upon a cast? George Peter Banyard 11:26 If you've type hint function as accepting only integers, so if you say foo(int $bar), and you pass it the float. And you would like in normal mode, it will truncate, and it will just pass an integer. Derick Rethans 11:40 Because it's just typed. George Peter Banyard 11:42 Yes, and we've had multiple reports of people being very confused about why it's just truncating the numbers, because it's not even rounding up. If you had like if you have like 0.9 it won't round up to one it will just truncate to zero, which a lot of people are confused by. Derick Rethans 11:58 In strict mode doesn't do that? George Peter Banyard 11:59 Yeah, because strict mode is very strict and will only allow you to pass explicitly what's been what you've requested, with the exception of the normal integer to float conversion which is lossless. Derick Rethans 12:12 That's lossless up to a certain point yes. George Peter Banyard 12:14 To a certain point like your integer doesn't fit, then it goes overflow to a float. Derick Rethans 12:19 All right. George thank you very much for taking your time this afternoon to talk to me. George Peter Banyard 12:23 Thank you for having me. Derick Rethans 12:26 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Explicit octal integer literal notation Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 69: Short Functions

November 05, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 69: Short Functions London, UK Thursday, November 5th 2020, 09:32 GMT In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a new RFC that's he proposing related to Short Functions. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:24 Hello, this is Episode 69. Today I'm talking with Larry Garfield, about an RFC that he's just announced called short functions. Hello Larry, would you please introduce yourself? Larry Garfield 0:35 Hello World, I'm Larry Garfield, the director of developer experience at platform.sh. These days, you may know me in the PHP world mainly from my work with PHP FIG. The recent book on functional programming in PHP. And I've gotten more involved in internals in the last several months which is why we're here. Derick Rethans 0:57 I'm pretty sure we'll get back to functional programming in a moment, and your book that you've written about it. But first let's talk about short functions, what are short functions, what is the problem that are trying to solve? Larry Garfield 1:11 Well that starts with the book actually. Oh. Earlier this year, I published a book called Thinking functionally in PHP, on functional programming in PHP, during which I do write a lot of functional code, you know, that kind of goes with the territory. And one of the things I found was that the syntax for short functions, or arrow function, or can be short lambdas, or arrow functions, you know whatever name you want to give them, was really nice for functions where the whole function is just one expression. Which when you're doing functional code is really really common. And it was kind of annoying to have to write the long version with curly braces in PSR 2, PSR 12 format for functions that I wanted to have a name, but we're really just one line anyway does return, blah blah blah. It worked, got the job done. Larry Garfield 2:13 Then hanging around with internals people, friend of the pod Nikita Popov mentioned that it should be really easy. Now that we've got short functions, or short lambdas, do the same thing for named functions. And I thought about. Yeah, that should be doable just in the lexer, which means, even I might be able to pull it off given my paltry miniscule knowledge of PHP internals. So, I took a stab at it and it turned out to be pretty easy. Short functions are just a more compact syntax for writing functions or methods, where the whole thing is just returning an expression. Derick Rethans 2:56 Just a single expression? Larry Garfield 2:58 Yes. If your function is returning two parameters multiplied together, it's a trivial case but you often have functions or methods that are doing. Just one expression and then returning the value. It's a shorter way of writing that. Mirrored on the syntax that short lambdas use. It doesn't enable you to really do anything new, it just lets you write things in a more compact fashion. But the advantage I see is not just less typing. It lets you think about functions in a more expression type way, that this function is simply a map from input to this expression, which is a mindset shift. So yes it's less typing but it's also I can think about my problem as simply an expression translation. And that's very very common in functional programming, also it helps encourage writing pure functions which functional programming is built on, and is just general good practice anyway, pure functions have no side effects. Take no stealth input from globals or anything like that. And their only output is their return value. You want most of your codebase to be that, and a short function is really hard to not make that, you can but it's hard not to. In one sense it's just syntax saving, in another sense it's enabling a more functional mindset, as you're writing code, which I'm very much in favor of. Derick Rethans 4:27 What is it that you're proposing then? Larry Garfield 4:29 The specific syntax is the same function, signatures we have now: function name, open paren, parameters, closed paren, optionally colon return type. But then instead of open curly brace, it's just a double arrow expression semi colon, the exact same body style as a short lambda has. Derick Rethans 4:52 Or as a match expression? Larry Garfield 4:54 Or a match expression or, you know, just array values, you know they're all examples of double arrow expression, which means this thing becomes that thing, wraps that thing. And so it's a very familiar syntax, and it can all go on on one line or you can drop it to the next line and indents depending on how long your code is. I've done both with short lambdas, they both read just fine. That is the exact same semantic meaning as open curly brace return that expression closed curly brace. Derick Rethans 5:28 Yeah, and that wouldn't allow you to allow multiple statements in that case, because the syntax just wouldn't allow for it. Larry Garfield 5:34 Correct. There's just like short lambdas. If you have multiple statements we'll get to that, that's not a thing. There is discussion of having short lambdas now take multiple lines. I haven't weighed in on that yet I'm not getting into that, at this point. Derick Rethans 5:50 Where came up more recently again was with the match expression for PHP 8.0, where, because you're limited to this one expression on the right hand side, there was originally talk of extending that to multiple lines, but then again that wouldn't met, that wouldn't match with the short lambdas that we have, or your short functions if they make it into the language. Larry Garfield 6:11 Right, all of that comes down to the idea of block expressions, which would be a multi lines set of statements that gets interpreted as an expression, which doesn't exist in PHP today, is how something like Rust works, but lifting that into PHP while there might be advantages to it is a big lift, and that needs a lot of thinking through to make it actually makes sense, it may not make sense. Again, I have not dug into that in much detail yet, other than to say, we really need to think that through before doing anything about it. Derick Rethans 6:45 Yeah, especially about what kind of return value you'll end of returning from it because a return without the return keyword is tricky to. It's tricky to create semantic reasoning for and how to do that right. Larry Garfield 6:56 Yeah, there's a lot of semantic trickery involved there so I explicitly avoiding that in this RFC, it's a nice simple surgical change. Derick Rethans 7:05 You mentioned some use cases in the form of, they're useful in functional programming, but most people don't use functional programming with PHP, or maybe in your opinion don't use it yet. Would would be use cases for non functional programming with PHP for this new syntax? Larry Garfield 7:22 Even if you're not doing formal functional programming. There's still a lot of cases where you have a function that just ends up being one line because that's all it needs to be. Especially if you're doing object oriented code. How many classes have you written that have, they're an entity class and they have eight properties, which means you have eight getter methods on them, each of which does nothing except return this arrow, you're removing three lines out of each one of those again using standard syntax conventions. By using a short function for that. You may also have a lot of refactoring techniques encourage producing single line functions or single line methods. For example, if you have an if statement, or a while statement or some other kind of check and check is A and B, and or C equals D, or some kind of complex logic there. Very common recommendation is alright break that out to a utility method, or utility function that can give a name to and becomes more self documenting. This is a good refactor and giving you a bunch of single line functions that are just really an expression. So, I'll write structure them as just an expression. My feeling is is more advantageous with standalone functions than with methods, but most of the logic applies to both equally well. Derick Rethans 8:51 In the case of setters and getters, that actually makes quite a bit of sense righ? Larry Garfield 8:55 It's just for getters for setters generally you set something, and then return this or return null, or something like that and that is a different statement, so it wouldn't work for setters. There are ways to click around that by calling a sub function there which I'm not actually going to encourage but you can do. Derick Rethans 9:15 Yeah, I guess you could create a lambda. Larry Garfield 9:17 And actually what you do is have a function that just takes a parameter and ignores and returns a second parameter. And then the body of your setter is calling that function with the Sep sabyinyo this row foo equals whatever, then your second parameter is this, and it ends up working. I don't actually suggest people do that. Derick Rethans 9:39 It's also too complicated for me to understand what you're trying to say there so let's, let's not encourage that use of it. Larry Garfield 9:46 I did it just to see if I could not because it's a good idea. Derick Rethans 9:50 Okay, your conclusion was, it's not a good idea. Larry Garfield 9:54 Cute hack. Derick Rethans 9:55 I saw in a discussion on the mailing list, some people talking about why is this using function and not fn. What is your opinion about that? Larry Garfield 10:04 Mainly because using function was easier to implement initially. So that's what I went with; just the way some of the lexer rules are structured, it was a bit trickier to use, to do fn. And I figured go with the easy one. That said, Sara Goleman gave a patch that's takes care the FN part. So there are patches available for both. Personally I moderately prefer function, I think it has less confusion. And if you have to convert a function from one to the other, it's less work then. But I don't really care all that much so if the consensus is we like the feature but we want to use fn I'm good with that too. There's some interesting discussion around, we were saying, there are some people trying to push right now to have short lambdas also take multiple statements, or to have long lambdas, anonymous functions, do auto capture. And so this question is now okay, the double arrow versus the FN, which one means auto capture, which one means single expression. I haven't weighed in on that yet. It'd be sense to sort all of it out and make it all, logically consistent, but as long as things are consistent I don't particularly care which keyword gets used where. Derick Rethans 11:20 By adding this feature to PHP, the syntax feature, is there a possibility for backward compatibility breaks? Larry Garfield 11:27 I don't believe so. The syntax I'm proposing would be a syntax error right now, so there shouldn't be any backward compatibility issues. Other use case I forgot to mention before, is if you're doing functional style code. Then, very often want to branch, your logic. Very concisely without full if statements, people are used to Haskell, I use the pattern matching and stuff like that, I'm not proposing that here. But the new match statement in PHP eight zero is a single expression. That gives you a branching capability. And so that dovetails together very nicely to say: okay, here's a function branch, using a match statement based on its inputs. And it maps to a single expression. Could be a call to another function, could be just a single expression. It's just another place where it doesn't make possible, anything you couldn't do before. It just makes certain patterns more convenient. Derick Rethans 12:25 So no BC breaks, but more use cases. Can you see what else could be added to this kind of style functionality for the future? Larry Garfield 12:35 I think this syntax itself is very easily self contained, does one simple thing and does it well and there's not much room to expand on it. There's no reason to have closures on named functions that's not really a thing. One of the things I like about it though, is it dovetails nicely with some other things that are in flight. A teasier for future episodes I suppose I'm collaborating with Ilya Tovolo on enumerations that hopefully will support methods on enumeration values. It is an excellent case for single line expressions because there's not much else to do in a lot of cases. So you can end up writing a very compact enumeration that has methods that are single expression, and boom, you've got a state machine. I've been working on a pipe operator that allows you to chain functions together. You can now have a single line, single expression, function that is just: take input and pipe it through a bunch of other functions. And now you have a complex pipeline that is just one single expression function. Hopefully these things will come to pass. I still got quite a bit of time before eight one's feature freeze, so we'll see what happens, but to me all of these things dovetail together nicely. And I like it when functionality dovetails together nicely. That means you can have functions or functionality that has benefit on its own. But in tandem with something else they're greater than the sum of their parts and that's to me the sign of good language design and good architecture. Derick Rethans 14:09 And we have been getting to some of that that PHP 8.0 with having both named arguments and promoted constructor arguments for example. Larry Garfield 14:17 Exactly. You talked about that several episodes ago. Derick Rethans 14:20 Would you have anything else to add? Larry Garfield 14:22 Like, the. A lot of the changes that have happened in PHP eight that have made, 7.4 and eight, that have made functional style code more viable and more natural. And that's the direction that I'm hoping to push more with my limited technical skills in this area, and better design skills and collaborating with others, but there's a lot of targeted things we could do in PHP 8.1 to make functional style code easier and more viable, which works really well in a request response environment. Which PHP's use cases are very well suited to functional programming. I'm hoping to have a lot of these small targeted changes like this that add up to continuing that trend of making PHP a more functional friendly language, Derick Rethans 15:11 But you're not proposing to turn into a functional language altogether? Larry Garfield 15:15 A strictly functional language? No, that would not make any sense. A language in which doing functional style things is easier and more natural? Absolutely. I think there's a lot of benefits to that, even in a world where people are used to doing things in an OO fashion. Those are not at odds with each other. Functional programming and object oriented code. A lot of the principles of functional programming are also principles of good OO code, like stateless services. That's a pure function by a different name. Ways to do a lot of those things more easily. I think is a benefit to everyone. Those who agree with me, I could use help on that, volunteers welcome. Derick Rethans 15:54 As always, as always. Okay, thank you, Larry for taking the time this morning slash afternoon to talk to me about short functions. Larry Garfield 16:02 Thank you, Derick and take care of PHP world. Derick Rethans 16:06 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me slash patron. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Short Functions Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 68: Observer API

September 17, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 68: Observer API London, UK Thursday, September 17th 2020, 09:31 BST In this episode of "PHP Internals News" I chat with Levi Morrison (Twitter, GitHub) and Sammy Kaye Powers (Twitter, GitHub, Website) about the new Observer API. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 68. Today I'm talking with Levi Morrison, and Sammy Powers, about something called the observer API, which is something that is new in PHP eight zero. Now, we've already passed feature freeze, of course, but this snuck in at the last possible moment. What this is observer API going to solve? Levi Morrison 0:44 the observer API is primarily aimed at recording function calls in some way so it can also handle include, and require, and eval, and potentially in the future other things, but this is important because it allows you to write tools that automatically observe, hence the name, when a function begins or ends, or both. Derick Rethans 1:12 What would you use that for? Levi Morrison 1:13 So as an example, Xdebug can use this to know when functions are entered or or left, and other tools such as application performance monitoring, or APM tools like data dog, New Relic, tideways, instana so on, they can use these hooks too. Derick Rethans 1:38 From what I understand that is the point you're coming in from, because we haven't actually done a proper introduction, which I forgot about. I've been out of this for doing this for a while. So both you and Sammy you work for data dog and work on their APM tool, which made you start doing this, I suppose. Sammy Kaye Powers 1:54 Yeah, absolutely. One of the pain points of tying into the engine to to monitor things is that the hooks are insufficient in a number of different ways. The primary way that you would do function call interception is with a little hook called zend_execute_ex and this will hook all userland function calls. The problem is, it has an inherent stack bomb in it where if, depending on your stack size settings you, you're going to blow up your stack. At some point if you have a very very deeply deep call stack in PHP, PHP, technically has a virtually unlimited call stack. But when you use zend_execute_ex, it actually does limit your stack size to whatever your settings are your ulimit set stack size. One of the issues that this solves is that stack overflow issue that you can run into when intercepting userland calls but the other thing that it solves is the potential JIT issues that are coming with PHP eight, where the optimizations that it does could potentially optimize out a call to zend_execute_ex where a profiling or APM tracing kind of extension would not be able to enter set that call, because of the JIT. The Observer API enables to solve multiple issues with this. Not only that, there's more. there's more features to this thing, because zend_execute_ex by default will intercept all userland function calls, and you have no choice but to intercept every single call, whereas, this API is designed to also allow you to choose which function calls specifically you want to intercept, so there is on the very first call of a function call. And it'll basically send in the zend function. This is a little bit of a point we've been kind of going back and forth on what we actually send in on the initialisation. But at the moment it is a zend function so you can kind of look at that and say okay do I want to monitor this function or observe it. If your extension returns a handler and says this is my begin handler This is my end handler. Those handlers will fire at the beginning and end of every function call. So it gives you a little bit of fine grain sort of resolution on what you want to observe. The other really kind of baked in design, part of the design is, we wanted it to play well with neighbours, because some of the hooks, at the moment, well, pretty much all of the hooks. Aside from typical extension hooks. Whenever you tie into the engine it's very easy to be a noisy neighbor. It's very easy not to forward a hook along properly it's very easy to break something for another extension. This has like kind of neighbour considerations baked right in. So when you actually request a begin and end hook. It will manage on its side, how to actually call those, and so you don't have to forward along the hook to other other extensions to make it a little bit more stable in that sense. Derick Rethans 4:52 From working on Xdebug, this is definitely problem forwarding both zend_execute_ex and also zend_execute_internal, which is something I also override are of course. And I think there are similar issues with, with the error display as well and PHP eight will also have a different, or a new API for that as well. Also coming out of a different performance monitoring tool, which is interesting to see that all these things works. You mentioned the Zend function thing and I'm not sure how well versed the audiences and all this internal things what is this zend function thing? Levi Morrison 5:24 as any function in the engine is what represents a function so not the scope that it's called in but the scope that it's defined in. It represents both method calls and function calls. It's triggered whenever a user land function is in play. So it has the function name, the name of the class that it's associated with, it tells you how many parameters you have and things like this. It does not tell you the final object that it's called with, and this is partly why we are debating what exactly should get passed in here, because some people may care. Oh, I only want to observe this with particular inheritors or, or other things of that nature so there's a little bit of fine tuning in the design perhaps still but the basic idea is you'll know the name of the function. What class it's in, and it's bound late enough in the engine that you would also have access to whatever parents that class has, etc. Derick Rethans 6:33 Does it contain the arguments as well, that are being sent, or just a definition of the arguments? Levi Morrison 6:38 The Zend function only contains the definition of the arguments. The hook is split into three sections kind of so there's like initialisation and then begin and end. Initialisation only gives you the Zendo function but to begin and gives you access to what's called the Zend execute data which has more information, including the actual arguments being passed. Derick Rethans 7:03 Okay, so it's the idea of the initialisation, just to find out whether you actually want to intercept the function. And if you want remember that and if not it wouldn't ever bother are the trying to intercept that specific zend function either. Sammy Kaye Powers 7:17 Actually what we actually pass into that initialization function is has been sort of up for debate. The original implementations, that is plural. We've had many different implementations of this thing over the, over the year. Derick you did mention that this got squeezed in last minute it has been a work in progress for a very long time and it actually is fulfilling JIT work so there's a specific mention in the JIT RFC that that mentions an API that is going to be required to intercept some function calls that are optimized out so that's why we were able to sneak in a little bit past feature freeze on the actual merge I think. But what we actually sent into this initialization function is spin up two for debate based on how we've actually implemented it. One of the original early implementations actually called this initialization function during the runtime cache initialization, just basically kind of a cache that gets initialized before the execute data actually is created. We didn't have the option of sending in the execute data at that time, we did have the zend function. So we were sending that in. Later on this implementation get refactored to a number different ways. We have the option now to send an execute data if we wanted to, but it might be sufficient to send in the op array instead of the Zend function. The op array should be the full sort of set of opcodes that basically is a function definition from the perspective of of internals, but it also includes like includes and evals. Having additional information at initialisation might actually be handy. I think we're still kind of maybe thinking about that potentially changing I don't know what do you think Levi. Levi Morrison 8:59 Yeah, you can get the oparray from the function so it's a little pedantic on which one you pass in I guess, but yeah. The idea is that we don't want to intentionally restrict it. It's just that the implementations have changed over the year so we're not sure exactly what to pass in at the moment. I think a zend function's pretty safe, passing in a zend oparray is perhaps a better signal to what it's actually for, because it can measure function calls, but also include, require, eval. And the oparray technically does contain more information. Again, if you have zend function, you can check to see if it is an oparray and get the operate from the Zend function. So a little pedantic but maybe a little better in conveying the purpose and what exactly it targets. Derick Rethans 9:56 And you can also get the oparray from zend_execute_data. Levi Morrison 10:00 Yeah. Derick Rethans 10:01 If I want to make use of this observe API I will do that? I guess, you said only from extensions and not from userland. Sammy Kaye Powers 10:08 Exactly. At the moment you would as an extension during MINIT or startup, basically in the very early process with the actual PHP processes starting up, would basically register your initialization handler. And at that point, under the hood, the whole course of history is changed for PHP at that point, because there is a specific observer path that happens when an observer extension registers at MINIT or startup. At that point the initialization function will get called for every function call. The very first function call that that function called is called, I know that sounds confusing but if you think you have one function and it's called 100 times that initialization will run one time. That point you can return either a begin and or at an end handler. If you return null handlers it'll never, it'll never bother you again for that particular function, but it will continue to go on that is don't mentioned earlier for every new function that encounters every new function call and encounters, I should say. Derick Rethans 11:12 There is not much overhead, because the whole idea is that you want to do as little overhead as possible I suppose. Levi Morrison 11:19 Exactly, we have in our current design in pre PHP 8.0. You could hook into all function calls using that zend_execute_ex, but it has performance overhead just for doing the call. So let's imagine we're in a scenario where we have two extensions, say Xdebug and one APM product. Both of them aren't actually going to do anything on this particular call it will still call those hooks, which has overhead to it. So if nobody is interested in a function, the engine can very quickly determine this and avoid the overhead of doing nothing. This way we only pay significant costs, if there's something to be done. Derick Rethans 12:09 You're talking about not providing much overhead at all. Just having the observer API in place, was there any performance hits with that? Sammy Kaye Powers 12:17 That was actually one of the biggest sort of hurdles that we had to overcome specifically with Dmitri getting this thing, merged in because it does touch the VM and whenever you touch the VM like we're talking like any tiny little null check that you have in any of the handlers is probably going to have some sort of impact at least enough for Dmitri, who understandably cares about like very very very small overheads that are happening at the VM level, because these are happening for every function call. You know, this is, this is not something that's just happening, you know, one time during the request is happening a lot. In order to apeace Dmitri and get this thing merged in, it basically had to have zero overhead for the production version non observer, his production version but on the non observed version on the non observed path it had to basically reach zero on those benchmarks. That was quite a task to try to achieve. We went through four, about four or five different ways of tying into the engine, we got it down to about, like, two new checks for every function call. And that still was not sufficient, so we end up going with based on Dmitris excellent suggestion, went with the opcode specialization, to implement observers so that at compile time. We can look and see if there's an observer extension present and if there is, it will actually divert the function call related opcodes to specific handlers that are designed for observing and that way once, once you get past that point, the observer path is already determined at compile time and all the observer handlers fire. In a non observed environment, all of the regular handlers will fire without any observability checks in them. Derick Rethans 14:03 At the end of getting within the loss of zero or not? Levi Morrison 14:07 It is zero for certain things. Of course, there are other places besides the VM that you have to touch things here and there for, you know, keeping code tidy and code sane but it's effectively zero, for all intents and purposes. Goal achieved. I will say zero percent. Derick Rethans 14:30 I think the last version of the patch that I saw didn't have the code specialization in it yet. So I'm going to have to have a look at myself again. Levi Morrison 14:39 Yeah, the previous version had very low overhead, so low overhead that you couldn't really observe it through any time based things. But if you measured instructions retired or similar things from the CPU, then it was about point four to 1% reduction, and personally I would have said that's good enough because all of them would correctly branch predict, because you either have handlers in a request, or you don't. And so they would perfectly predict, every time. But still, those are extra instructions technically so that's why Dmitri pushed for specialization and those are no longer there. Derick Rethans 15:27 Does that mean there are new opcodes specifically for this, or is it just the specialization of the opcodes that is different? Sammy Kaye Powers 15:33 It's just this specialization. During the process of going, figuring out what exactly Dmitri needed to mergeable actually proposed an implementation that added basically an observer version of every kind of function call related opcode like do_fcall_observed, or observed_return or something like that. With, opcodes specialization, it reduces the amount of code that you have to write sort of at the VM level, it doesn't change the amount of code that's generated though because with opcode specialization, basically the definition file will get expanded by this, this php file that actually generates C code. When you add a specialization, to a handler that already has specializations on it, it will expand quite considerably. The PR at one point ended up being like 10,000 lines or something like that, so we had to do some serious reduction on the number of handlers that were automatically generated. Long story short, is there are no new opcodes but there are new opcode handlers to handle this specific path. Derick Rethans 16:40 Not sure what, if anything more to ask about the Observer API, do you have anything to ask yourself? Levi Morrison 16:45 I think it's worth repeating the merits of the observer API and where we're coming from. The key benefits in my opinion are that it allows you to target per function interception for observing. It allows you to do it in a way that's that plays nice with other people in the ecosystem and increasingly that's becoming more important. We've always had debuggers and some people occasionally need to turn debuggers on in production and other things like this. But increasingly, there are other products in this space; the number of APM products is growing over time. There are new security products that are also using these kinds of hooks. And I expect over time we will see more and more and more of these kinds of of tools, and so being able to play nicely is a very large benefit. At data dog where Sammy and I both work we've hit product incompatibilities a lot of times, and some people are better to work with than others. I know that Xdebug has done some work to be compatible with our product specifically but you know competitors aren't so interested in that. We care a lot about the community right, we want the whole community to have good tools, and I don't think we actually mentioned yet that we did collaborate with some other people and competitors in this space. That hopefully proves that that's not just words of mine that, you know, we actually met with the competitors who were willing to and discussed API design, and use cases, and making sure that we could all work together and compete on giving PHP good products rather than, you know, hoarding technical expertise and running over each other and causing incompatibilities with each other. So I think those are really important things. And then lastly, it does not have that stack overflow potential that the previous hooks you could use did. Derick Rethans 18:54 Yeah, which is still an issue for Xdebug but but I fixed that in a different way by setting an arbitrary limit on the amount of nested levels you can call, right. Levi Morrison 19:02 Yeah, and in practice that tends to work pretty well because most people don't have intentionally deep code. But for some people they do. And we can't as an APM product for instance say: sorry your code is just not good code, we can't observe a crash your your your thing and so we can't make that decision. And then the biggest con at the moment is that it doesn't work with JIT, but I want to specifically mention that, that's not a technical thing, that's just a not enough time has been put in that space yet because this was crunched to the last second trying to get it in. And so, some things didn't get as much focus yet. Hopefully by the time 8.0 gets released it will be compatible with JIT, or at least it will be only per function, so maybe a function that gets observed, maybe that can't be JIT compiled that specific function call, but all the other function calls that aren't observed would be able to. We'll see obviously there's still work to do there but that's our hope. Derick Rethans 20:10 What happens now, if, if you use the observer API and the JIT engine is active? Does it just disable the JIT engine. Sammy Kaye Powers 20:16 Yep. It just won't enable the JIT at all. In fact, it just goes ahead and disables it, if an observer extension is enabled and there is a little kind of debug message that's emitted inside of the OP cache specific logs that will will say specifically why the JIT isn't enabled just in case you're sitting here trying to turn the JIT on you're like, why isn't enabled, and it'll say there's an observer extension present so we can enable the JIT. Hopefully they'll be able to work a little bit, and maybe just change an optimization level or something in the future. I'd like to give a shout out to Benjamin Eberlei, who has been with us since the very beginning on this whole thing has been vetting the API on his products, has gotten xhprof on not only the original implementation but also on the newest implementation, and has just been a huge help in actually getting this thing pushed in, and was said some of the magic words that actually, this thing merged in, when it was looking like it wasn't gonna land for eight dot O and got it landed for eight dot one so Benjamin gets a huge thumbs up. So, Nikita Popov, Bob Weinand, and Joe Watkins really early on. These are awesome people from internals who have spent some time to help us vet the API, but also to help us with specific implementation details. It's been just a huge team effort from a lot of people and it was just like, really great to work across the board with everybody. Derick Rethans 21:35 Yeah, and the only thing right now of course is all the extensions that do observe things need to be compatible with this. Sammy Kaye Powers 21:43 Exactly. Derick Rethans 21:44 Which is also means there's work for me, or potentially. Sammy Kaye Powers 21:47 Absolutely. Levi Morrison 21:49 I guess one one minor point there is that if an extension does move to the new API, it is a little bit insulated from those that haven't moved to the new API. So, to some degree, it still benefits the people who haven't moved yet because the people who have moved have one less competitor in the same same hook, so it's just highlighting the fact that it plays nicely with other people. Derick Rethans 22:14 Is opcache itself actually going to use it or not? Levi Morrison 22:17 So this is focused only on userland functions; past iterations that was not the case. Dmitri kind of pushed back on having this API for internals and so that got dropped. I don't think at this stage there's any there's any value in opcache using it specifically, but there are some other built in things like Dtrace. I don't know how many people actually use Dtrace; I actually have used it once or twice, but Dtrace could use this hook in the future instead of having a custom code path and things like that. Derick Rethans 22:49 For Xdebug I still need to support PHP seven two and up, so I'm not sure how much work it is doing it right now, but definitely something to look into and move to in the future, I suppose. Well thank you very much to have a chat with me this morning. I can see that for Sammy the sun has now come up and I can see his face. Thanks for talking to me this morning. Sammy Kaye Powers 23:10 Thanks so much, Derick and I really appreciate all the hard work you put into this because I know firsthand experience how much work podcasts are so I really appreciate the determination to continue putting out episodes. It's a huge amount of work so thanks for being consistent. Levi Morrison 23:26 Yeah, thank you so much for having us Derick. Derick Rethans 23:30 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patroen. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes Pull Request: https://github.com/php/php-src/pull/5857 Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 67: Match Expression

August 20, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 67: Match Expression London, UK Thursday, August 20th 2020, 09:30 BST In this episode of "PHP Internals News" I chat with Derick Rethans (Twitter, GitHub, Website) about the new Match Expression in PHP 8. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 67. Today we're going to talk about a match expression. I have asked the author of the match expression RFC, lija Tovilo, whether it wanted to come and speak to me about the match expression, but he declined. As I think it's important that we talk in some depth about all the new features in PHP eight, I decided to interview myself. This is probably going to sound absolutely stupid, but I thought I'd give it a go regardless. So here we go. Derick Rethans 0:53 Hi Derick, would you please introduce yourself? Derick Rethans 0:56 Hello, I'm Derick and I'm the author of Xdebug I'm also PHP seven four's release manager. I'm also the host of this podcast. I'm also you. Derick Rethans 1:07 What a coincidence! Derick Rethans 1:10 So what is the problem that is RFC is trying to solve? Derick Rethans 1:13 Well, before we talk about the match expression, we really need to talk about switch. Switch is a language construct in PHP that you probably know, allows you to jump to different cases depending on the value. So you have to switch statement: switch parentheses open, variable name, parenthesis close. And then for each of the things that you want to match against your use: case condition, and that condition can be either static value or an expression. But switch has a bunch of different issues that are not always great. So the first thing is that it matches with the equals operator or the equals, equals signs. And this operator as you probably know, will ignore types, causing interesting issues sometimes when you're doing matching with variables that contain strings with cases that contains numbers, or a combination of numbers and strings. So, if you do switch on the string foo. And one of the cases has case zero, and it will still be matched because we put type equal to zero, and that is of course not particularly useful. At the end of every case statement you need to use break, otherwise it falls down to the case that follows. Now sometimes that is something that you want to do, but in many other cases that is something that you don't want to do and you need to always use break. If you forget, then some weird things will happen sometimes. It's not a common thing to use it switch is that we switch on the variable. And then, what you really want to do the result of, depending on which case is being matched, assign a value to a variable and the current way how you need to do that now is case, say case zero result equals string one, break, and you have case two where you don't set return value equals string two and so on and so on, which isn't always a very nice way of doing it because you keep repeating the assignment, all the time. And another but minor issue with switch is that it is okay not to cover every value with a condition. So it's totally okay to have case statements, and then not have a condition for a specific type and switch doesn't require you to add default at the end either, so you can actually have having a condition that would never match any case, and you have no idea that that would happen. Derick Rethans 3:34 How's the match expression going to solve all of this stuff? Derick Rethans 3:37 The match expression is a new language keyword, but also allows you to switch depending on a condition matching a variable. You have matching this variable against a set of expressions just like you would switch, whereas a few major differences with switch here. So, unlike switch, match returns a value, meaning that you can do: return value equals match, then your variable that you're matching, and the value that gets assigned to this variable is the result of the expression on the right hand side of each condition. So the way how this works is that you do result equals match, parenthesis, opening your variable name parenthesis close curly braces and then you have your expression which can just be zero, or one, or a string, or floating point number, or it can be an expression. For example, a larger than 42. And then this condition is followed by an arrow, so equals, greater than sign, and then another statement. What a statement evaluates to get assigned to the return value of the match keyword. Which means that you don't have to do in every case, you don't have to return value equals value, or return value equals expression. Now the expression itself, the evaluator what evaluates to gets returned to match. So that is one thing but as a whole bunch of other changes. The matching with a match keyword is done based on strict type. So instead of using the equal operator, the two equal signs, match uses the identical operator which is the three equal signs so that is strict comparison. Normal rules for type coercion are suspended, no matter whether you have strict types defined in your script, meaning that the match keyword always takes care of the type. It will not be any type juggling in there that can create confusing results. I've already mentioned that it can also be an expression on both the left and right hand sides. Switch also allows you to have an expression on the left hand side, although that isn't used very much, I guess. Another difference between switch and match is that, in case a switch on your right hand side, you can have multiple statements so you can have case seven, colon, and then a bunch of statements and statements are ended by the break, pretty much. With match you can only have one expression. It's possible that is going to change in the future, but it is at the moment analogous with what you can do with the short arrow functions, the arrow function with the fn keyword. That also only allows one expression on the right hand side. Switch also doesn't do automatic fall through to the next condition. If you have a single statement that doesn't make a lot of sense anyway, but this is one of the other changes here. So just like switch you can add multiple conditions separated by comma, on the left hand side. So you can do case zero comma one arrow, and then your expression again. As I mentioned with switch, it is possible to not cover all the cases like it is possible to say four possible values for a specific variable. Take for example you have addition, subtraction, multiplication and division. If none of the conditions that you have set up with a match matches. Then you'll get a unhandled match error exception, it is still possible to have a default condition just like you have it switched out. But if you don't cover any of the conditions or default, then you will get an exception. Anything that is actually still the same between switch and a new match. Well yes, any implementation each condition is still, even evaluated in order, meaning that if the first condition doesn't match it start evaluating the second condition or the third condition and so on, and so on. Also, just like switch again, if all the conditions are either numeric values or strings, PHP's engine will construct a jump table. That was introduced somewhere in PHP seven three I think, that automatically jumps to the right case. Switch, all the conditions have to be either integer numbers like 01234 or all strings, like foo, like bar, like baz, or so on and so on. The match statement, actually, it's a bit more flexible here because it can construct one jump table, if all the conditions are numbers or strings. It doesn't matter that are all numbers or all strings. If there are all numbers or integer numbers or strings, then it can construct this jump table. That doesn't work with switch because switch's type coercion and match doesn't and the internal implementation already support like this hashmap which is pretty much an array that supports integer array keys as well as associative array keys. But because for match the, the matching happens independent on the type is actually ends up working, so does actually works a little bit better, which is great. Derick Rethans 8:54 Where there, any other additions that were considered to add to the new match keyword? Derick Rethans 8:58 Well, there were a few things. There was a bit of discussion about blocks, meaning multiple statements to run for each condition. In the end it didn't become part of the RFC, perhaps because it made it a lot more complicated, or perhaps because it was really important to think that functionality through and also at the same time, think about what that does for the short array functions which also, just like match, only support one specific statement at the moment. There were some thoughts about adding pattern matching to the condition just like Perl does a little bit, where yeah like with regular expressions for example, but is also really difficult subject and lots of considerations have to be taken into account so that was also dropped from this RFC. The last one was a quick syntax tweak, which allow you to omit the variable name for a match expression. If you end up matching only against like expressions, like a larger than 42, or b smaller than 12, then it doesn't necessarily matter what you have behind match; the variable name there doesn't matter. So, well the trick that people already use with switch is, is to use switch (true). And with match you can also use match (true), and the addition that was suggested to do here was to be able to not have the true there at all. So, the match would only work on the conditions and not try to match these against a variable, but that also didn't become part of this current RFC. Derick Rethans 10:31 Are there any backward compatibility breaks? Derick Rethans 10:34 Well beyond match being a new keyword, there are none. But because match is a new keyword that we're introducing, it means it can't be used as a full namespace name, a class name, a function name, or a global constant. It shouldn't really be much of a surprise that PHP just can introduce these keywords, and that ends up breaking some code. I haven't looked at any analysis about how much code is actually going to break, but it is possible that it does actually do some. But also PHP also top level namespace so if you had a class name called match, you should have put it in your own namespace. And in PHP eight with Nikita's namespace token names RFC, as long as match is on its own, then your namespace name, you'd still be able to use it now, as part of a namespace name, which isn't possible or wouldn't have been possible with PHP seven four. Derick Rethans 11:33 Okay. What was the reception of this RFC? Derick Rethans 11:37 Initially, there was quite a little bit of going back and forth about especially the pattern matching or a few other things. But in the end were to slightly reduce scope of the RFC, it actually ended up passing very well with 43 votes and two votes against, which means it's now part of PHP eight. In the last week or so we did find a few bugs. Some crashes. But, Ilija the author of both RFC and the implementation is working on these to get those fixed, so I'm pretty sure we'll all quite ready for PHP eight with the new match expression. Derick Rethans 12:12 Thanks Derick for explaining this new match expression. It was a bit weird to interview myself, but I hope it turned out to be fun enough and not too weird. Derick Rethans 12:21 Thanks for having me Derick. Derick Rethans 12:23 This is going to be the last episode for a while, as PHP 8's feature freeze is now in effect, and no new RFCs are currently being proposed. Although I'm pretty sure they are being worked on. There's one exception to the feature freeze period, which is the short attribute syntax change RFC, which which I'm collaborating on the Benjamin Eberlei, whether that will turn into yet another episode about attributes, we'll have to see. For the PHP eight celebrations, I'm hoping to make two compilation episodes again, as it did last season with Episode 36 and 37. For the PHP eight celebrations episodes, I am again looking for a few audio snippets from you, the audience. I'm looking for a short introduction, with no commercial messages, please. After your introduction, then state which new PHP eight feature you're looking most forwards to, or perhaps another short anecdote about a new PHP eight feature. Please keep it under five minutes. With your audio snippet, feel free to email me links to your Twitter, blog, etc. The email address is in the closing section of each of the episodes. Here's an example of what I'm looking for. Derick Rethans 13:35 Hi, I'm Derick, and I host PHP internals news. I am the author of Xdebug and I'm currently working on Xdebug cloud. My favourite new feature in PHP eight are the additions to the type system, but union types and a mixed type continue to strengthen PHP's typing system. We've grown up from simple type hints to real proper types, following the additional property types and contract covariance and PHP seven four. I'm looking forward to PHP's type system to be even stronger, perhaps, with generics in the future. Derick Rethans 14:07 Please record us as a lossless compressed file, preferably FLAC, FLAC, recorded at 44,100 hertz. And if you save them bit of 24 bits that. If you want to make a WAV file or a WAV file that's fine too. Please make them available for me to download on a website somewhere and email me if you have made one. Derick Rethans 14:33 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Match expression Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 66: Namespace Token, and Parsing PHP

August 13, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 66: Namespace Token, and Parsing PHP London, UK Thursday, August 13th 2020, 09:29 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Namespaced Names as a Single Token, and Parsing PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 66. Today I'm talking with Nikita Popov about an RFC that he's made, called namespace names as token. Hello Nikita, how are you this morning? Nikita 0:35 I'm fine Derick, how are you? Derick Rethans 0:38 I'm good as well, it's really warm here two moments and only getting hotter and so. Nikita 0:44 Same here. Same here. Derick Rethans 0:46 Yeah, all over Europe, I suppose. Anyway, let's get chatting about the RFC otherwise we end up chatting about the weather for the whole 20 minutes. What is the problem that is RFC is trying to solve? Nikita 0:58 So this RFC tries to solve two problems. One is the original one, and the other one turned up a bit later. So I'll start with the original one. The problem is that PHP has a fairly large number of different reserved keyword, things like class, like function, like const, and these reserved keywords, cannot be used as identifiers, so you cannot have a class that has the name list. Because list is also a keyword that's used for array destructuring. We have some relaxed rules in some places, which were introduced in PHP 7.0 I think. For example, you can have a method name, that's a reserved keyword, so you can have a method called list. Because we can always disambiguate, this versus the like real list keyword. But there are places where you cannot use keywords and one of those is inside namespace names. So to give a specific example of code that broke, and that was my, my own code. So, I think with PHP 7.4, we introduced the arrow functions with the fn keyword to work around various parsing issues. And I have a library called Iter, which provides various generator based iteration functions. And this library has a namespace Iterfn. So Iter backslash fn. Because it has this fn keyword as part of the name, this library breaks in PHP 7.4. But the thing is that this is not truly unnecessary breakage. Because if we just write Iter backslash fn, there is no real ambiguity there. The only thing this can be is a namespace name, and similarly if you import this namespace then the way you actually call the functions is using something like fn backslash operator. Now once again you have fn directly before a backslash so there is once again no real ambiguity. Where the actual ambiguity comes from is that we don't treat namespace names as a single unit. Instead, we really do treat them as the separate parts. First one identifier than the backslash, then other identifier, then the backslash, and so on. And this means that our reserved keyword restrictions apply to each individual part. So the whole thing. The original idea behind this proposal was actually to go quite a bit further. The proposal is that instead of treating all of these segments of the name separately, we treat it as a single unit, as a single token. And that also means that it's okay to use reserved keywords inside it. As long as like the whole name, taken together is not a reserved keyword. The idea of the RFC was to, like, reduce the impact of additional reserved keywords, introduced in the future so in PHP eight we added the match keyword, which can cause similar issues, and in PHP 8.1 maybe we're going to add an enum keyword, and so on. Each time we add a keyword we're breaking code. The idea of this RFC was to reduce the breakage, and the original proposal, not just allowed these reserved keywords inside namespace names, but also removed the restrictions we have on class names and function names and so on. So you would actually be able to do something like class Match. Then, if you wanted to use the class, you would have to properly disambiguate it. For example, by using a fully qualified name, starting with a backslash or by well or using any other kind of qualified name. So you wouldn't be able to use an isolated match, but you could use backslash match, or some kind of things they just named backslash match. However, in the end, I dropped support for this part of the RFC, because there were some concerns that would be confusing. For example if you write something like isset x, that would call the isset on built in language construct. And if you wrote backslash isset x, that would call a user defined, isset function, because we no longer have this reserved keyword restriction. While this is like an ambiguous from a language perspective, the programmer might get somewhat confused. So if we want to go in this direction we probably want to add more validation about which reserved keywords are allowed in certain positions and I didn't want to deal with that at this point. Derick Rethans 5:33 Also by removing it, you make the RFC smaller which gives usually a better chance for getting them accepted anyway. Nikita 5:40 Yes. Derick Rethans 5:41 You mentioned in the introduction that originally you tried to solve the problem of not being able to use reserved keywords in namespace names. It ended up solving another problem that at the moment you wrote this, you wasn't quite aware of. What is this other problem? Nikita 5:57 The other problem is the famous or infamous issue of the attributes syntax, which we are having a hard time solving. The backstory there is that originally the attributes introduced the PHP eight use double angle brackets, or shift operators as delimiters. Derick Rethans 6:18 I've been calling it the pointy one. Nikita 6:20 Okay, the pointy one. And then there was a proposal accepted to instead change this to double at operator at the start, because that's kind of more similar to what other languages do; they use one @, we will use two @s on to avoid the ambiguity with the error suppression operator. Unfortunately, it turned out that as initially proposed the syntax is ambiguous. All because of quite an edge case, I would say. So attributes in PHP are going to be allowed on parameters, and then parameters can have a type. You can have a sequence where you have @@, then the attribute name, then the type name, then the parameter name. And the problem is that because we treat each part of the name separately, between the backslashes, there can also be whitespace in between. So you can have something like a, whitespace, backslash, whitespace, b. Now the question is, in this case does the ab belong to the attribute so is it an attribute name, or is only the a an attribute name, and b is the type name of the parameter. Yeah that's ambiguous and we can't really resolve it unless we have some kind of arbitrary rule, and while the original proposal did introduce an arbitrary rule that the attribute name cannot contain whitespace, it will be interpreted in a particular way. And what this proposal, effectively does is to instead say that names, generally cannot contain whitespace. Derick Rethans 8:01 Your namespace names as token RFC basically disallows spaces for namespace names which you currently can have? Nikita 8:10 Right. So I don't think anyone intentionally uses whitespace inside namespace names. Or actually, you could even have comments inside them. But it is currently technically allowed, and one might introduce it as a typo. That means that this change is a backwards compatibility break. Because you can have currently whitespace in names, but based on static analysis of open source projects, we found that this is pretty rare. So I think we found something like five occurrences inside the top thousand packages. There is one other backwards compatibility break that is in practice I think much more serious. And that's the fact that it changes our token representation. So instead of representing just names as a string separates a string. We now have three different tokens, which is the one for qualified names, one for fully qualified names. So with everything backslash and one for namespace relative names, which is if you have namespace backslash at the start, and namespace I mean, here literally. So this is a very rarely used on PHP feature. Derick Rethans 9:26 I did not know it existed. Nikita 9:28 I also actually did some analysis for that I found very few uses in the wild. I think mostly people writing the static analysis tools know about that, no one else. But the other problem is that this breaks static analysis tools because they now have to deal with a new token representation. Derick Rethans 9:47 We have been talking about these tokens. What are tokens on like the smallest level. Nikita 9:51 PHP has, what's a three phase compilation process where first we convert the raw source code, which is just a sequence of characters into tokens, which are slightly larger semantic units. So instead of having only characters we recognize reserved keywords and recognize identifiers and operators, and so on. Then on the second phase, we have the parser which converts these individual tokens into larger semantic structures like addition expression, or an assignment expression or class expression and so on. Finally we convert the result of that which is a parse tree or an abstract syntax tree into our actual virtual machine instructions, our bytecode representation. Derick Rethans 10:41 And then PHP eight down to machine code, if the JIT kicks in. I remember from a long time ago when I was in uni, is that there are different kinds of parsers and from what I understand PHP's parser is a context free parser. What is is a context free parser? Nikita 10:57 Well a context free in particular is a term from the CompSci theory, where we separate parsers into four languages, into four levels, though I think this is not a super useful distinction, when it comes to language parsing. Because general context free parser has complexity of O(n^3). Like if you have a very large file, it will take very long to parse it could be with the general context free parser. So what we'll be do in practice are more like deterministic context free languages, which is a smaller set. The formal definition there are a set that can be parsed by deterministic push down automaton but I don't think you want to go there. Derick Rethans 11:41 No, not today. Nikita 11:44 Because in practice actually what we do in PHP is even more limited than that. So the parser we use in PHP is called LA/LR1 parser. So that's a look ahead, left to right, right most derivation parser Derick Rethans 11:59 Also mouthful, but it's simpler to explain. Nikita 12:02 I think really the most relevant parts of this algorithm to us is the "1". What the one means is that we can only use a single look ahead token. When we recognize some kind of structure, we have to be able to recognize it by looking at the current token, and that the next token. At each parsing step, those two things have to be sufficient to recognize and to disambiguate syntax. Actually even more restrictive than that but I think that's a good mental model. This is really, I think the core problem we often have when introducing new syntax, so this is the problem we had with the short arrow syntax, short closure syntax. There we would need not one token of look ahead but an infinite potentially in finite number of look ahead. Derick Rethans 12:55 In case the function keyword is still being used. Nikita 12:59 Yes, that's why we added the fn keyword to what the problem because with the fn keyword we can immediately say at the start of the start of the arrow function that this is an arrow function. And we will have similar problems if we introduce generics in the future, at which point we might actually just give up and switch to a more powerful parser. The parser generator we use, which is bison, has two modes. One is the LA/LR mode. And the other one is the GLR mode. They both use the same base parsing algorithm at first, but the GLR mode allows forking it. So if it encounters an ambiguity, it will actually like try to pursue both possibilities, which is of course a lot less efficient, but it allows us to parse more ambiguous structures. Derick Rethans 13:50 Wouldn't that not cause a problem that has some cases because every token will be ambiguous, it will explode in like an exponential way? Nikita 13:57 Yes, I mean it's worst case exponential. But if you like have a careful choice of where you are ambiguous, then you can see that okay with this particular ambiguity, you can actually get worse still linear for valid code. Derick Rethans 14:14 That makes a decision between two possibilities pretty much at that stage. Nikita 14:18 But I think the danger there is that one might not notice when one introduces ambiguities, or maybe not real syntax ambiguities, you do tend to notice those. But ambiguities on the parser level where the parser cannot distinguish possibilities based on the single token for get. Derick Rethans 14:41 Was there in the past being ambiguities introducing a PHP language, that we fail to see? Nikita 14:46 I don't think so. I mean, because we use this parser generator. It tells us if we have ambiguities, so either in the form of shift/reduce or reduce/reduce conflicts. So we cannot really introduce ambiguities. We can it's possible to suppress them, and we actually did have a couple of suppressed conflicts in the past, this was for one particular very well known ambiguity, which is the dangling elseif/else. Basically that the else part of an ifelse. So if you have a nested if without braces. Then there is an else afterwards, the else could ever belong to the inner if or to the outer if, but this is like a standard ambiguity in all C like languages that allow omitting two braces. Derick Rethans 15:35 That's why coding standard say: Always use braces. The patch belonging to this RFC has been merged. So it means that in PHP eight we'll have the longer or the new token names. Do you think that in the future we'll have to make similar choices or similar adjustments to be able to add more syntax? Nikita 15:56 Yes. As I was already saying before, I think these syntax conflicts, they crop up pretty often. This is like very much not an isolated occurrence. If it's really fundamental. If there are any fundamental problems in the PHP syntax where we made bad choices. I think it's pretty normal that you do run into conflicts at some point. So, especially when it comes to generics using angle brackets, in pretty much all languages I deal with my major parsing issues when it comes to that. For example Rust has the famous turbo fish operator, where you have to like write something like colon colon angle brackets to disambiguate things in this specific situation. Derick Rethans 16:49 I just listened to a podcast on the Go language where they're also talking about adding generics, and they had the exact same issue with parsing it. I think they ended up going for the square bracket syntax instead of the angle brackets, or the pointies, as I mentioned. Nikita 17:04 Everyone uses the pointy brackets syntax, despite all those parsing issues because everyone else use it. For PHP for example, we wouldn't be able to use square brackets, because those are used for array access, and that would be completely ambiguous, but we could use curly braces. Well, people have told me that they would not appreciate that. Let's say it like this. Derick Rethans 17:31 Is it's finally time to start using the Unicode characters for syntax? No! Nikita 17:38 Although I did see that, apparently, some people use non breaking spaces inside their test names, because it looks nice and is pretty crazy. Derick Rethans 17:49 I like using it in my class names as well instead of using study caps. Nikita 17:52 I hope that's a joke, you can ever tell. Derick Rethans 17:54 I like using emojis instead. That's also a joke, but I do use it in my presentations and my slides just to jazz them up a little bit. Nikita 18:01 This is an advantage of PHP, because our Unicode support in identifiers works because the files are in UTF-8. That means that all the non ASCII characters have the top bit set, and we just allow all characters with a top bit set as identifiers. And that means that there is no validation at all that identifiers used contain only like identifier or letter characters, so you can use all of the emojis and whitespace characters and so on, you won't have any restrictions. Derick Rethans 18:36 And that was possible for as long as PHP 3 as far as I know. It's a curious thing because this is something that popped up quite a lot when we're discussing, or arguing, about Unicode. You shouldn't allow Unicode because then you can have funny characters in your function names like accented characters and stuff like that, and then also always fun to point out that yeah you could have done that since 1997. Anyhow, would you have anything else to add? Nikita 19:01 I don't think so. Derick Rethans 19:02 Well thank you very much for having a chat with me this morning. And I'm sure we'll see each other at some other point in the future. Nikita 19:09 Thanks for having me once again Derick. Derick Rethans 19:12 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Treat namespaced names as single token GLR Parser LALR(1) Parser Iter Library RFC: Match expression Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 65: Null safe operator

August 06, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 65: Null safe operator London, UK Thursday, August 6th 2020, 09:28 BST In this episode of "PHP Internals News" I chat with Dan Ackroyd (Twitter, GitHub) about the Null Safe Operator RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 65. Today I'm talking with Dan Ackroyd about an RFC that he's been working on together with Ilija Tovilo. Hello, Dan, would you please introduce yourself? Dan Ackroyd 0:37 Hi Derick, my name is Daniel, I'm the maintainer of the imagick extension, and I occasionally help other people with RFCs. Derick Rethans 0:45 And in this case, you helped out Ilija with the null safe operator RFC. Dan Ackroyd 0:50 It's an idea that's been raised on internals before but has never had a very strong RFC written for it. Ilija did the technical implementation, and I helped him write the words for the RFC to persuade other people that it was a good idea. Derick Rethans 1:04 Ilija declined to be talking to me. Dan Ackroyd 1:06 He sounds very wise. Derick Rethans 1:08 Let's have a chat about this RFC. What is the null safe operator? Dan Ackroyd 1:13 Imagine you've got a variable that's either going to be an object or it could be null. The variable is an object, you're going to want to call a method on it, which obviously if it's null, then you can't call a method on it, because it gives an error. Instead, what the null safe operator allows you to do is to handle those two different cases in a single line, rather than having to wrap everything with if statements to handle the possibility that it's just null. The way it does this is through a thing called short circuiting, so instead of evaluating whole expression. As soon as use the null safe operator, and when the left hand side of the operator is null, everything can get short circuited, or just evaluates to null instead. Derick Rethans 1:53 So it is a way of being able to call a methods. A null variable that can also represent an object and then not crash out with a fatal error Dan Ackroyd 2:02 That's what you want is, if the variable is null, it does nothing. If a variable was the object, it calls method. This one of the cases where there's only two sensible things to do, having to write code to handle the two individual cases all the time just gets a bit tedious to write the same code all the time. Derick Rethans 2:20 Especially when you have lots of nested calls I suppose. Dan Ackroyd 2:25 That's right. It doesn't happen too often with code, but sometimes when you're using somebody else's API, where you're getting structured data back like in an in a tree, it's quite possible that you have the first object that might be null, it's not null, it's going to point to another object, and the object could be null so and so so down the tree of the structure of the data. It gets quite tedious, just wrapping each of those possible null variables with a if not null. Derick Rethans 2:55 The RFC as an interesting example of showing that as well. Is this the main problem that this syntax or this feature is going to solve? Dan Ackroyd 3:03 That's main thing, I think there's two different ways of looking at it. One is less code to write, which is a thing that people complain about in PHP sometimes it being slightly more verbose than other languages. The thing for me is that it's makes the code much easier to reason about. If you've got a block of code that's like the example in the RFC text. If you want to figure out what that code's doing you have to go through it line by line and figure out: is this code doing something very boring of just checking null, or is it doing something slightly more interesting and, like maybe slipping in a default value, somewhere in the middle, or other possible things. Compare the code from the RFC, to the equivalent version using the null safe operator, it's a lot easier to read for me. And you can just look at it. Just see at a glance, there's nothing interesting in this code going on. All is doing is handling cases where some of the values may be null, because otherwise you can just look at the right hand side of the chain of operators, see that it's either going to be returning null, or the thing on the right hand side. So for me, it makes code a lot easier to reason about something. I really appreciate in new features that languages acquire. Derick Rethans 4:17 From what I can see it reduces the mental overhead quite a bit. As you say, the full expression is either going to be the null, or the return type of the last method that you call. Dan Ackroyd 4:29 There's nothing of value in between. So all of that extra words is wasted effort both writing it, and reading it. Derick Rethans 4:37 Okay, you mentioned short circuiting already, which is a word I stumble over and had to practice a few times. What is short circuiting? Dan Ackroyd 4:45 Very simple way of explaining it is this is similar to an electronic circuit when it's short circuited the rest of the circuit stops operating. It just the signal stop just returns back to earth. The null safe short circuiting, it means that when a null is encountered the rest of the chain of the code is short circuited and just isn't executed at all. Derick Rethans 5:08 Okay, what you're saying is that if you have a variable containing an object or working objects, you call a method that returns a null. And then, because it's null it won't call the next method on there any more. Dan Ackroyd 5:20 Yes, it won't execute the rest of the chain of everything after a short circuit, takes place. Derick Rethans 5:27 The RFC describes multiple way of short circuiting in, to be more precise, it's talks about three different kinds of short circuiting. Which ones are these and which ones have been looked at? Dan Ackroyd 5:37 $obj = null; $obj?->foo(bar())->baz(); There's apparently three different ways to do short circuiting. And the way this has been implemented in PHP is the full short circuiting. One of the alternative ways was short circuiting for neither method arguments, nor chained method calls. Imagine you've got some code that is object method call foo. And inside foo there's a function called bar and then after the method call to foo, there's a another method call to baz. First way that some other languages have is short circuiting when basically no short circuiting. So, both the function call would be called, and also then the method call baz, would also be called. And this option is quite bad for both those two details, having the function call happen when the result of that function call is just gonna be thrown away is pretty surprising. It just doesn't seem that sensible option to me, and even worse than that, the chaining of the method call as is pretty terrible. It means that the first method call to foo could never return null, because the, there's no short circuiting happening. Each step then need to artificially put short circuiting after the method call to foo, which is a huge surprise to me. This option of not short circuiting either method, arguments, or chained method calls seems quite bad. Derick Rethans 7:04 If one of the methods set wouldn't have been called normally because the objects being called is null, and the arguments to that function are, could be functions that could provide side effects right. You don't know what's the bar function call is going to do here for example. Dan Ackroyd 7:18 It just makes the whole code very difficult to reason about and quite dangerous to use short circuiting in those languages. Derick Rethans 7:25 It's almost equivalent that if you have a logical OR operation right. If the first thing is through evaluates to true and you do the OR the thing behind the OR it's never going to be executed, it's a similar thing here and I suppose, except the opposite. Dan Ackroyd 7:40 It's very similar in that people are used to how short circuiting works in OR statements. And for me, similar sort of short circuiting behaviour needs to happen for null safe operator as well for it to be not surprising to everybody. Derick Rethans 7:55 This is the first option short circuiting never for neither method arguments, for chained methods calls. What was the second one? Dan Ackroyd 8:02 So the second one is to short circuit for method arguments but not for chaining method calls. This scenario, the call to bar wouldn't take place, but the call to the subsequent method call of baz would still take place. This is slightly less bad, but still, in my opinion, not as good as for short circuiting, again because even if the method call foo should never return null, cause the null propagates through the chain in the expression, then have to artificially use a another null safe operator to avoid having a can't call methods on parent. Derick Rethans 8:41 And then the third one, which is the short circuiting for both method arguments and chained method calls. Dan Ackroyd 8:47 That's the option has been implemented in PHP. This is the one that is most sensible, in my opinion. Soon as the short circuit occurs, everything in the rest of the chain of operators that applies to that objects, get short circuited. To me is the one that is least surprising and the one that everyone's going to expect for it to work in that way. Derick Rethans 9:08 So I've actually I have a question looking at the code that you've prepared here where it says: object, question mark, arrow, foo, which is the syntax. We didn't mention the syntax yet. So it is object, question mark, arrow, foo. And the question mark being the addition here. Would it make sense that after foo, instead of using just the article baz to use question mark baz. It'd be a reason why you want to do that? Dan Ackroyd 9:33 There are only for languages that don't do full short circuiting. For the languages that don't do full short circuiting and the null makes its way through to then have baz called on it, you have to use another null safe operator in there, just to avoid another error condition happening. Derick Rethans 9:51 Very well. Which other languages, actually have full short circuiting? Dan Ackroyd 9:55 So the languages that have full short circuiting are C sharp, JavaScript, Swift, and TypeScript, and the languages that don't have for short circuiting are Kotlin, Ruby, Dart, and hack. I'm not an expert on those languages but having a quick look around the internet, it does seem to be that people who try to use null safe operator in the languages that don't implement full short circuiting are not enjoying the experience so much. To me it appears to be a mistake in those languages. I don't know exactly why they made that choice to imagine that is more a technical hurdle, rather than a deliberate choice of this is the better way. It does appear that implementing the full short circuiting is quite significantly more difficult than doing the other option, other types of short circuiting, just because the amount of stuff that needs to be skipped to the full short circuiting, so you've got to imagine that they thought it was going to be an acceptable trade off between technical difficulty and implementation. The, I think that's probably going to be useful enough. To me it just doesn't seem to be that great of a trade off. Derick Rethans 11:07 Short circuit circuiting happens when you use the null safe operator. So the null safe operator that syntax is question mark arrow. You mentioned that there is a chain of operators, what kind of operators are part of this chain or what is this chain specifically? Dan Ackroyd 11:21 So the null safe operator will by, and short circuit, a chain of operators, and it will consider any operators that look inside or appear in side and object to be part of a chain, and operators that don't look inside an object to be not part of that chain. So the ones that appear inside an object are property access, so arrow, null safe property access. So, question mark arrow. Static property access, double colon. Method call, null safe method call, static method call, and array access also. Other operators that don't look inside the object would be things like string concatenation or math operators, because they're operating on values rather than directly like pairing inside the object to either read a property or do mythical, they aren't part of the chain. They'll still be applied, and they will be part of the chain that gets short circuited Derick Rethans 12:17 Which specific chain is used as an argument in a method called as part of another, what sort of happens here? Dan Ackroyd 12:27 This is at the limit of my understanding, but the chains don't combine or anything crazy like that. It's only in the object type operators here inside the objects that might be null. That will continue a chain on a chain of operators is then used as a argument to another method call or function call. That's the end of that chain, and they'd be two separate chains, so for me there's no surprising behaviour around the result of a non safe operator being used as an argument to another function call, or another method call, which might have a separate null safe operator on the outside, those two things are independent. They don't affect each other. Derick Rethans 13:09 Yeah, I think that seems like a logical way of approaching this. Otherwise, I expect that the implementation will be quite a bit trickier as well. Dan Ackroyd 13:17 This is actually something I consider quite deeply when people come up with RFCs is, how would I explain this to a junior developer. If the answer is: I would struggle to explain this to a junior developer. That probably means that, one I don't understand the idea well enough myself, or possibly that the idea is just a bit too complicated for it to be used safely in projects. I mean, there's a difference between stuff being possible to use safely. And we've got things in PHP that are possible to use safely but aren't always going to be used safely, like eval and other questions which can be used quite dangerously, and the general rule for a junior developer would be: You're not allowed to use eval in your code, you need to have a deep understanding of how it can be abused by people. But something for like the null safe operator. It's got to work in a way that's got to be safe for everyone to use without having a really deep understanding of what the details are of its implementation. Derick Rethans 14:14 That makes a lot of sense yes. The RFC talks about a few possibilities that are not possible to do with a null safe operators. What are these things it talks about? Dan Ackroyd 14:25 Until right before the RFC went for a vote. There was, as part of the RFC there was quite a bit more complexity. And it was possible to use null safe in the write context. So you could do something like null safe operator bar equals foo, effectively assigning the string foo something that might or might not be null. I only learnt this recently. You can also foreach over an array into the property of an object, which I've never seen before in my 20 odd years writing PHP. It would be possible to use the null safe operator in there. And you'd be foreaching over an array into either a variable or into null, which is not a variable. The RFC was trying to handle this case, these cases are generally referred to as write contexts, as opposed to just read contexts where the value is being read. The RFC has a lot of work went into supporting the null safe operator in these write contexts. But luckily, somebody pointed out that this was just generally a rubbish idea, and hugely complex. It has a problem that you just fundamentally can't assign a variable or a value to a possibly non existent container. It just doesn't make any sense. So, a couple of days before it went to a vote, Ilija just asked the question, why don't we just restrict it to read contexts. That's where the value of the RFC is, making it easier to write code that's either going to read a value or call a method on something that might or might not be null. All of this stuff around write context was just added complexity that doesn't really deliver that much value. It's nothing to do with the using null safe operator in write context was removed from the RFC, which made it a whole lot simpler. It made it a lot less likely that people would like to code that doesn't do what they expected. Derick Rethans 16:17 I also think it would be a lot less likely to have been accepted in that case, as it stands the vote for this feature was overwhelmingly in favour. Did you think it was going to be so widely accepted or did you think the vote was going to be closer? Dan Ackroyd 16:31 So I thought it was gonna be a lot closer. There are quite a few conversations on the internet, where people raise a point that one I don't fully agree with it was a valid point to make. They were concerned that people use null safe operator in places were having null, rather than an object might represent an error. And if people are just using the null safe operator to effectively paper over this error in their code, then it would make figuring out where the null came from, and what error had occurred to cause it to be there. I think the answer to that is this feature isn't appropriate use in those scenarios. If a variable being null, instead of an object represents an error in your code, then you shouldn't be using this null safe operator to skip over that error condition. You need to not use it and watch a bit more code that explicitly defines, or checks that error, handles it more appropriately, rather than just blindly using this feature, without thinking if they pick your use case. Derick Rethans 17:30 Or of course thrown exception, which is what traditionally is done for this kind of error situations right? Dan Ackroyd 17:35 be a thing so it shouldn't be null, but there's very small chance it is, but only is in situations where you're not aware and throwing an exception so you can error out, and then debug it is the correct thing to do, rather than just silently have errors in your application. Derick Rethans 17:50 Would you have anything else that to the null safe operator? Dan Ackroyd 17:52 Not really. Except say it's quite interesting that quite a few of the new features for PHP eight are, don't technically allow anything new to be done, there just remove quite a bit of boilerplate. For me, it'll be interesting to see the reaction to that from the community because this is something people have criticized PHP for for quite a long time, they're being very verbose language, particularly compared to TypeScript, where a lot of very basic creating objects can be done in very few lines of code. So between the null safe operator, and the object property promotion, that's for some projects, which use a lot of value types, or data from other services where you don't have control over how it's structured, I think these two features are going to remove a lot of boilerplate code. So I think this might improve people's experience of PHP quite dramatically. Derick Rethans 18:40 That ties back into this sort of idea that Larry Garfield has with his object ergonomics, right, especially with the data values that the document was referring to mostly. Dan Ackroyd 18:50 Yeah. Derick Rethans 18:51 I've another question for you, the last one. Which is: What's your favourite new feature in PHP eight? Dan Ackroyd 18:56 Generally the improvements to the type systems are going to cheat by giving two answers. Union types. This will actually be really nice for the imagick extension just because a huge number of the methods in there, accept either string or an imagick pixel object which represents colour internally to the imagick library. Have been able to have correct types on all the methods that currently don't have the right, don't have the correct type information, will be very, very nice. Doesn't make anything new possible. It just makes it easier to reason about the code, so it's easier for tools like PHP Stan to have the correct type of information available rather than having to look elsewhere for the correct type information. Again the mixed types, is very small improvements to the type system in PHP, but it's another piece that helps complete the puzzle of the pulling out the type system to make it be closer to being a complete a type system that can be used everywhere, rather than having to have magic happening in the language. Derick Rethans 20:02 It also will result in more descriptive code right because all the information that you as well as PHP Stan need to have to understand what this is saying or what I was talking about. It's all right there in the code now. Dan Ackroyd 20:15 Yeah, and having the information about what code's doing in the code, rather than in people's heads, makes it easier for compilers, and static analysers to do their jobs. Derick Rethans 20:26 Thank you, Dan for taking the time this afternoon to talk to me about the null safe operator. Dan Ackroyd 20:31 No worries Derick, very nice to talk to you as always. Derick Rethans 20:35 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Null Safe Operator Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 64: More About Attributes

July 30, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 64: More About Attributes London, UK Thursday, July 30th 2020, 09:27 BST In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about a few RFCs related to Attributes. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 64. Today I'm talking with Benjamin Eberlei, about a bunch of RFCs related to Attributes. Hello Benjamin, how are you this morning? Benjamin Eberlei 0:36 I'm fine. I'm very well actually yeah. The weather's great. Derick Rethans 0:39 I can see that behind you. Of course, if you listen to this podcast, you can't see the bright sun behind Benjamin, or behind me, by the way. In any case, we don't want to talk about the weather today, we want to talk about a bunch of RFCs related to attributes. We'll start with one of the RFCs that Benjamin proposed or actually, one of them that he has proposed and one of them that he's put into discussion. The first one is called attribute amendments. Benjamin Eberlei 1:05 Yes. Derick Rethans 1:06 What is attribute amendments about? Benjamin Eberlei 1:08 So the initial attributes RFC, and we talked about this a few months ago was accepted, and there were a few things that we didn't add because it was already a very large RFC, and the feature itself was already quite big. Seems to be that there's more sort of an appetite to go step by step and put additional things, in additional RFCs. So we had for, for I think about three or four different topics that we wanted to talk about, and maybe amend to the original RFC. And this was sort of a grouping of all those four things that I wanted to propose and change and Martin my co author of the RFC and I worked on them and proposed them. Derick Rethans 1:55 What are the four things that your new RFC was proposing? Benjamin Eberlei 1:59 Yes, so the first one was renaming Attribute class. So, the class that is used to mark an attribute from PHP attributes to just attribute. I guess we go into detail in a few seconds but I just list them. The second one is an alternative syntax to group attributes and you're safe a little bit on the characters to type and allowed to group them. And the third was a way to validate which declarations, an attribute is allowed to be set on, and the force was a way to configure if an attribute is allowed to be declared once or multiple times on one declaration. Derick Rethans 2:45 Okay, so let's start with the first one which is renaming the class to Attribute. Benjamin Eberlei 2:51 Yeah, so in the initial RFC, there was already a lot of discussion about how should this attribute class be caught. Around that time that PhpToken RFC was also accepted, and so I sort of chose this out of a compromise to use PhpAttribute, because it is guaranteed to be a unique name that nobody would ever have used before. There's also, there was also a lot of talk about putting an RFC, to vote, about namespacing in PHP, so namespacing internal classes and everything. So I wanted to keep this open at the at the time and not make it a contentious decision. After the Attributes RFC was accepted, then namespace policy RFC, I think it was called something like that, was rejected, so it was clear that the core contributors didn't want a PHP namespace and internal classes to be namespaced, which means that the global namespace is the PHP namespace. And then there was an immediate discussion if we should rename at PhpAttribute to Attribute, because the prefix actually doesn't make any sense to have there. So the question was, should we introduce this BC break potentially, because it's much more likely that somewhere out there has a class Attribute in the global namespace. And I think the discussion, also one, one person mentioned that they have it in their open source project. Suppose sort of the the yeah should we rename it and introduce this potential BC break to, to users that they can't use the class Attribute and the global namespace anymore. Derick Rethans 4:25 So that bit passed? Benjamin Eberlei 4:26 That bit passed with a large margin so everybody agreed that we should rename it to Attribute. I guess consistent with the decision that the PHP namespace is the global namespace. It's sort of documented in a way. Everybody knows that we have the namespace feature for over 10 years now in PHP or oma, yeah exactly 10 years I guess. So by now, userland code is mostly namespaced I guess, and we can be a bit more aggressive about claiming symbols in the global namespace. Derick Rethans 4:54 Second one is grouping, or group statements in attributes, what's that? Benjamin Eberlei 4:59 The original syntax for the RFC, for attributes uses ... the ... we had this before. Remember? I can't spell, I don't know the HTML opening and closing signs. Derick Rethans 5:15 The lesser than and greater than signs. Benjamin Eberlei 5:17 Two lesser than signs and then two greater than signs, for the opening and closing of attributes. And this is quite verbose. So if you have multiple attributes on one declaration, it could be nice to allow to use the opening and closing brackets only once, and then have multiple attributes declared; so that was about the grouping. There was a sentence in there that should there be an RFC that sort of changes the syntax and that would become obsolete. It was also the most contentious vote of all the four because it almost did not pass but it closely passed. Derick Rethans 5:56 32-14 Yeah, yeah. So the group statements also made it in. You can now separate the arguments by comma. The third one is validate attribute target declarations. Benjamin Eberlei 6:07 That one makes a lot of sense to have from the beginning. What is does is, whenever you create a new attribute, you can configure that this attribute can only be used for example on a class declaration, or only on a method and function declaration, only on a property declaration. And so there are different targets, what we call them, where an attribute can be put on. This is something that will be required from the beginning. So, if you start using attributes, they will only work on certain declarations. Let's say we are configuring routes. So, the mapping of URL to controllers using attributes in a controller can only ever be a function, or a method. So it makes sense to require that the route attribute is only allowed on the function and the method. This should be a good improvement. Another example would be if you use attributes for ORM. So database configuration, then you would declare properties of classes to be columns in a database table, but it would never makes sense to declare the column attribute on a method, or on a class, or something like this. So it would be nice if the validation fails, or the usage of the attribute fails on these other declarations. Derick Rethans 7:25 When is this target checked? Benjamin Eberlei 7:27 The target checks are only done, I started referring to this as deferred validation, so that the validation is deferred to the last possible point in the usage of attributes. Which is, you call the method newInstance(), so give me an instance of this attribute on the ReflectionAttribute class. This has been a point of discussion because it's quite late, it will not fail at compile time, for example, even though it potentially could. The idea to do this at this late point is: we cannot always guarantee that an attribute is actually available when the code is compiled. For example if you use attributes of PHP extensions, the extension might be disabled. And also, you might have different attributes of different libraries on the same declaration. If we would validate them early it could potentially lead to problems. Well, sort of one usage of one attribute of one library breaks others. We decided to make this validation as late as possible from the language perspective. I already see that potentially static analysis tools like Psalm, PHP Stan and IDEs will probably put it in your face very early that you're using the attribute on the wrong declaration. And I guess I don't know static analysis tools in PHP have gotten extremely good and more and more people are starting to use them. And I really like the sort of approach that the language at some point can't be too strict. And then the static analysis tools, essentially, introduce an optional build step that people can decide to use where the validation is happening earlier. Derick Rethans 9:06 Where do you define what the targets for an attribute are? Benjamin Eberlei 9:10 This was also a lot of discussion. We picked the most simple approach; the Attribute class gets one new argument: flags, and it is a bit mask, and by default, an attribute is allowed to be declared on all declarations; classes, methods, properties, and so on. You can restrict the flags, the bitmask, to only those targets that you want to allow. So you would, let's say you have the route attribute we talked about before, you specify this as an attribute by putting an attribute at the class declaration. And then you pass it the additional flags and say only allowed using the bitmask for function method. I think it's just one constant, you use a constant to declare it's only allowed on a method or a function declaration. Derick Rethans 9:59 Can you for example set it a target is both valid for a method and a property? Benjamin Eberlei 10:02 You can, because it's a bit mask you can just use the OR operator to combine them, and allow them to be declared on both. Derick Rethans 10:10 It's also passed, so this is also part of PHP eight. And then the last one is the validate attribute repeatability Benjamin Eberlei 10:20 Repeatability essentially means you're allowed to use an attribute once, or multiple times on the same declaration. Coming back to the routing example, if you're having route attribute, then you could say this is allowed multiple times on the same function, because maybe different URLs lead to the same controller. But if you have a column property for example, you could argue, maybe it doesn't make sense that you define a column twice on the same property so it's mapped to two columns or something. So you could restrict that it's only allowed to be used once. This is also validated again deferred, so only when newInstance() called, it would throw an exception if it's sort of the second or third usage of an attribute but it's only allowed to be there once. Derick Rethans 11:06 What's the default? Benjamin Eberlei 11:07 The default is that the attributes are not allowed to be repeatable and you would, again using the flag arguments to the attribute class, put the bit mask in there that it's allowed to be repeatable. Derick Rethans 11:21 Is there anything else in that RFC. Benjamin Eberlei 11:23 No. Derick Rethans 11:25 The second one was actually something that came out of the first one that she did. The original RFC already mentioned that she could for example have attributes called deprecated, or JIT, or no JIT. I suppose this RFC fleshes out the deprecated attribute a little bit more? Benjamin Eberlei 11:40 Idea was to introduce an attribute "deprecated", which you can put on different kinds of declarations, which would essentially, replace the usage of trigger_error(), where you would say, e_user_deprecated and then say for example: this method is deprecated use another one instead. Benefit of having an attribute versus using trigger_error() is that static analysis tools have an easier way of finding them. It's both a way to make it more human readable because attributes are declarative, it's easy to see them and spot them. And also machine readable. Both developer and machine could easily see that this method, or this class is deprecated, and stop using it, or using the alternative instead. The way it was done at the moment, or until now, mostly you're using a PHP doc block @deprecated in there. Please understand that. You can also read it as human, of course, the problem is that it has no really no runtime effect. That means during development for example, you couldn't see that you're accidentally using deprecated methods and collect this information, or even on production for example, because it's only in a doc block. And with an attribute, it would be possible to have this information at runtime to collect it, aggregate this information so that you see, it's been deprecated code is being used, and also make it human readable and for the IDEs and everything. Derick Rethans 13:12 When I looked at this RFC, it was under discussion, are you intending to bring this up for a vote, before 8.0 because you don't have a lot of time now? Benjamin Eberlei 13:19 I'm not. The reason is that first, you say, I'm running out of time, and I'm a bit stressed at the moment. Anyways, and even then, there are two points that I wanted to tackle it first. I mentioned that you can use the deprecated attribute on methods and functions which is something people are using now. One thing that I saw would be a huge benefit which is not possible at the moment is to be deprecating the use of classes, vacating the use of properties and constants. This is something you cannot hook into at runtime at the moment. So you cannot really trigger it. Essentially has led to a lot of hacks specifically from the Symfony community because they are very very good about deprecations. So they are, they have a very very thorough deprecation process, and they have come up with a few workarounds to make it possible to trigger this kind of things that would like to make this a much better supported case from the language. The problem there is that it touches the error handling of PHP and the usage of the error handler for deprecation triggers is a contentious problem, because it triggers global error handlers as by this side effects, sometimes effects on the performance because it logs to a file. There was a lot of questions if we can improve this in some way or another, and I haven't found a good way for this yet and I hope to work on this for a month. Derick Rethans 14:47 Maybe 8.1 then. The third RFC that goes into attributes, is the shorter attribute syntax RFC. What's wrong with the original syntax, where you have lesser than, lesser than, attribute greater than, greater than? Benjamin Eberlei 15:03 The name implies, the verbosity is a problem, I would agree. So writing, writing the symbols lesser than or greater than twice, yeah, takes up at least four symbols. There's one problem with shorter attribute syntax would be nice. Problem I guess is that this syntax is sort of a unicorn. We do refer to HHVM as one language or Hack, in that case one language that uses it. Whatever the Hack has essentially no adoption and is not very widespread in usage, and since they have parted ways with PHP. They also have a pass to removing the syntax and they, I think plan to change the syntax to use the @ symbol instead of the lesser and greater. So one problem would be that we were like, language wise, we would be a unicorn by using a syntax that nobody else is using this also I guess a problem. And then there are some, some problems that focusing on future RFCs. Something that I don't see myself really advocating or wanting, is allowing to nest attributes, so one attribute can reference another attribute, so that you have a sort of an object graph of attributes stuck together. This is necessary if you want to configure, if you want to use attributes to use very detailed configuration. Not sure if it makes sense at some point. There's this trade off where you should go to XML, YAML, or JSON configuration instead off attribute based configuration. To confusion with potential future usages of PHP and existing usages, so: generics. PHP would ever get generics, in some way, it would use the lesser and greater than symbols, but only once. I cleared up with Nikita before that there's no conflict between so that that would not be a problem. However, if we had the symbol used, they would be near each other. So attributes would come first and then there would be the function declaration using generics and types. These symbols would be very near each other so it might introduce some confusion when reading the code. Then there's obviously the existing usage of this shift up operators so if you're doing shift operations with bits, then they are using exactly the same symbol, the lesser than, greater than symbol twice, to initiate this or declare this operation. Derick Rethans 16:18 I realized there's a few issues with the pointy pointy attribute pointy pointy, to give it a funnier name. Benjamin Eberlei 17:34 Pointy pointy is good. Derick Rethans 17:41 I don't think I've come up with a name for a specific spaceship that looks like this yet, but there were a few other proposals. The original RFC already had the @:, which was rejected in that RFC. There was another proposal made that uses the @@ syntax to do attributes. It this something used by all the languages. Benjamin Eberlei 18:01 Syntax was the most discussed thing about attributes, so everybody sort of agreed that we need them or they we benefit. Nobody could agree on the syntax. So there were tons of discussion everywhere about how ugly the pointy pointy syntax is, and because we're using the @ symbol and the doc blocks already you by we couldn't use the @ symbol for attributes themselves, so there are lots of discussions running in circles. Just using the @ symbol once it's just not possible because we have the error suppression operator. Then a lot of people always suggest okay let's get rid of that, because it's anyways it's bad practice to use it Derick Rethans 18:38 It's a bad practice but sometimes it's necessary to use it. Benjamin Eberlei 18:41 So there are some APIs in the core that cannot be used without them. For example, unlink, mkdir, fopen. Because of race conditions you have to use the symbol there in many cases. Even if we would deprecate and remove that sort of syntax we would still somehow need it in a way for these functions. So I'm not seeing that we will ever remove them, and even then it will take two cycles of major versions, so it doesn't make sense. That one is out. The pointy syntax was something that Dmitri proposed many years before and people were sort of in favour of it, so I didn't change it for the initial RFC. We looked a lot at different syntaxes but none of them were nice and didn't break BC. And then when the vote started some core contributors proposed that maybe we can do slight BC breaks in the way we have existing symbols and one proposal was to use @@, which means we are using the @ symbol twice. This is actually allowed in PHP at the moment, we consider breaking this not a big problem because you can just use it once and it has the same effect. Problem would only be going through the code and finding all @@ usages which we assumed are zero anyways. Derick Rethans 19:57 Is there any other language that uses @@? Benjamin Eberlei 19:59 A few other languages use just to @, like many other, like many people propose PHP should also do, which I explained I guess is impossible. It's only proposed because it's looking close enough to @, and so it wouldn't be a huge difference and sort of familiar in that way. Derick Rethans 20:18 There was another alternative syntax proposed which is #[ attribute ]. Where does this come from? Benjamin Eberlei 20:25 So the @@ syntax is different to the pointy the syntax that doesn't have a closing symbol, it only has the starting symbols. Nikita and Sara both said that would be fine as a BC break and Sara is the release master for PHP eight, so she has a lot of pull on these kinds of things. Nikita as well. So, they said it's okay to have small BC breaks, we did look at other syntaxes. I saw that, which I liked before already but couldn't use because would be BC break, was the yeah hash, and then square brackets, opening and there's a closing square bracket close. This is used in Rust. It is a small BC break in PHP because the hash is used as a one line comment. It's not used that much any more, by people to use comments because coding styles discourage the use of it. I use it myself sometimes, like, specifically in one time shell scripts and stuff it's nice to use it, to use the hash as a comment. And the change, would there be that if you have a hash, followed by any other symbol than an opening square bracket, it would still be a comment. If the first symbol is square bracket opening, then it would be parsed as an attribute instead. Derick Rethans 21:42 So this is something you say that rust uses? Benjamin Eberlei 21:43 Yes. Derick Rethans 21:44 Because they're pretty much three alternatives now, we have the original pointy pointy one, which I still think is the best name for it, @@ operator, and then the hash square brackets operator. With three alternatives it's difficult to do our normal voting which is, how would you two thirds majority, but with three variants that is a tricky thing. As a second time ever, I think we use STV to pick our preferred candidate. Well this is STV kind of thing. Benjamin Eberlei 22:11 You need to help me with the abbreviation any ways resolving it. Derick Rethans 22:14 Single transferable vote. Benjamin Eberlei 22:16 STV is a process where you can vote on multiple options, and still make sure that the majority wins, or that the most preferred one wins. Ensure that it has a majority across the voters. The way it works is that everybody puts up that their preferences. There are many different options, in our case three, everybody puts their preference, I want this one first, this one second, is one third. And then the votes are made in a way where we look at the primary votes, who gave whom the primary vote. And then the option that has the least votes gets discarded. All those people that pick the least one, they have a second preference, and we look at the second preference of these voters and transfer, sort of change, the votes, their originally, but discarded option to the options that are still available. Now, somebody has the majority, they win if they don't, we discard another option and it goes on and on. With just three options, it's a transfer of a vote would essentially only happen once, because once one is discarded, one of the two other ones, not must have because it can be a tie, but probably has a majority, yeah. Derick Rethans 23:34 In this case, when we're looking at votes , when were counted them, it actually was obvious that no transferring of votes was actually necessary because the @@ won outright, and had quorum. In the end it up ended up being a pointless exercise, but it's a good way of picking among three different options. Which one won? Benjamin Eberlei 23:53 The @@ syntax won over the Rust Rusty attributes, and the pointy, pointy pointy attributes. One was clearly chosen as a winner, I think, but 54%, 56% in the first vote already. However, one thing that came up in the sort of aftermath of the discussion is that the grammar of this new syntax is actually not completely fine parse any language like PHP or any other programming languages to have defined the grammar and then the grammars inputs to a parser generator, as required some hacks to get around the possibility of using namespaces relative name step basis afterwards. This is something where Nikita satisfaction not allowed. In the way PHP builds its grammar, without any further changes @@ is actually going to be rejected as a syntax. However, Nikita independently worked on sort of change in the tokensets of PHP, that would provide a workaround, or not a workaround. It's actually a fix for the problem. If this new way of declaring tokens for namespaces would be used in PHP eight, then it's also possible to use the @@ syntax for attributes. If we don't change it, then the @@ syntax is actually not going to make it. Derick Rethans 25:18 In my own personal opinion, I think, @@ is a terrible outcome of voters is going to be so does that mean I should vote against Nikita's namespace names as token RFC or not? I mean it's a bit unfair. Benjamin Eberlei 25:30 I will vote for it. The premise of Nikita's RFC, and he worked on it before the @@ problem became apparent, so he did not only work on it to fix this problem. The problem he's trying to fix is that parts of namespace could use potentially keywords that we introduce as keywords in the future. An example is the word enum, and people are talking about adding enums to PHP for years. He had also one use case where one of his own PHP libraries broke by the introduction of the FN keyword for PHP 7.4, the short closure syntax. Essentially, it would help if some part of a namespace would contain a word that becomes a keyword in the future, that it wouldn't break the complete namespace. Because at the moment, the namespace. In this case is burned. You can't even use it anywhere in the namespace. The key word, which I guess it's a problem. But this reason I am actually for this proposal that Nikita has, even though I agree with you. I was actually a proponent of the Rusty attribute syntax, so I would have preferred to have that very much myself. Derick Rethans 26:40 I hope to talk to Nikita about what this namespace token thing is exactly about in a bit more detail in a future episode. Feature freeze is next week, August 4th. Are you excited about what PHP eight is going to be all about? Benjamin Eberlei 26:54 It's been a lot of last minute, things that have been added that to look really, really great. I can't remember PHP seven any more that closely, but I remember that it was the case with PHP seven as well. So anonymous classes was added quite late. Essentially PHP seven initially was just about the performance and then there was a lot of additional nice stuff added, very late, and made it a from a future perspective, very nice release, and it seems, it could be the same for PHP eight. Match syntax, attributes, then potentially the named parameters, which at the time of our speaking is still an open vote but looks very very promising. Derick Rethans 27:36 And then of course we have union types as well, and potentially benefits of the JIT right? Benjamin Eberlei 27:40 Union types this already accepted for so long I forgot about it. Derick Rethans 27:45 Alpha versions of PHP eight zero are already out, so if you want to play with these features you can already, and please do. And if you find bugs, don't assume it's you doing something wrong, instead file a bug report at https://bugs.php.net. Thank you, Benjamin, for talking to me about more attributes, this morning. Benjamin Eberlei 28:03 Thank you very much for having me. Derick Rethans 28:06 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Attribute Amendments RFC: Deprecated attribute RFC: Shorter Attribute Syntax RFC: Treat namespaced names as single token Psalm PHPStan Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 63: Property Write/Set Visibility

July 23, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 63: Property Write/Set Visibility London, UK Thursday, July 23rd 2020, 09:26 BST In this episode of "PHP Internals News" I talk with André Rømcke (Twitter, GitHub) about an RFC that he is working on to get asymmetric visibility for properties. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 63. Today I'm talking with André Rømcke, about an RFC that he's proposing titled property write/set visibility. Hello André, would you please introduce yourself? André Rømcke 0:38 Hi Derick, I'm, where to start, that's a wide question but I think I'll focus on the technical side of it. I'm been doing PHP for now 15 plus years. Actually started in, eZ systems, now Ibexa, met with you actually Derick. Derick Rethans 0:56 Yep that's a long time ago for me. André Rømcke 0:58 And well I came from dotnet and front and side of things. Eventually I learned more and more in PHP and you were one of the ones getting me deeper into that, while you were working on eZ components. I was trying to profile and improve eZ publish which what's left of the comments on that that point. A long time ago, though. A lot of things and I've been working in engineering since then I've been working now actually working with training and more of that, and the services and consulting. One of the pet peeves I've had with PHP has been properties for a long time, so I kind of wanted several, several years ago to look into this topic. Overall, like readonly, immutable. To be frank from our side that the only thing I really need is readonly, but I remember the discussions I was kind of involved for at least on the sideline in 2012/ 2013. Readonly and then there was property accessors. And I remember and there was a lot of people with different needs. So that's kind of the background of why I proposed this. Derick Rethans 2:04 We'll get back to these details in a moment, I'm sure. The title of the RFC is property write/set visibility. Can you give me a short introduction of what visibility actually means in this contract, in this context? André Rømcke 2:16 Visibility, I just use the word visibility because that's what doc usually in php.net says, but it's about access the property, so to say, or it being visible to your code. That's it on visibility but today we can only set one rule so to say, we can only say: this is either public, private, or protected, and in other languages, there are for good reasons, possibilities to have asynchronous visibility. So, or disconnected or whatever you want to call it, between: Please write and read. And then in, with accessors you'll also have isset() and unset() but that's really not the topic here at this point. Derick Rethans 2:56 PHP sort of supports these kind of things, with it's magic methods, like with __get() and __set(). And then also isset() and unset(). You can sort of implement something like this, of course, in your own implementation, but that's, of course, ugly, I mean, ugly or, I mean there was no other way for doing it and I remember, you made have a user that in eZ Components that you mentioned earlier. What is the main problem that you're wanting to solve with what this RFC proposes? André Rømcke 3:25 the high level use case is in order to let people, somehow, define that their property should not be writable. This is many benefits in, when you have multiple API's, in order to say that I have this property should be readable. But I don't want anyone else about myself to write to it. And then you have different forms of this, you have either the immutable case were you literally would like to only specify that it's only written to in constructor, maybe unset in destructor, may be dealt with in clone and so on, but besides that, it's not writable. I'm not going into that yet, but I'm kind of, I was at least trying to lay the foundation for it by allowing the visibility or the access rights to be asynchronous, which I think is a building block for moving forward with immutability, we only unintentionally also accessors but even, but that's a special case. Derick Rethans 4:24 What is the syntax that you're suggesting to implement this, this feature? André Rømcke 4:30 Currently in the RFC there's two proposals. There's one where you have, for instance public:private. First you define the read visibility and then you run the find the write visibility. That was based on the email list back in 2012 proposal made back then. The one is inspired by Swift, the apple language, where they actually have a concept for this. So they have public than being the read access and then they have private and in the parenthesis they have (set). Basically this is then set. Benefit of this maybe it will probably be a better match for our reflection API, in terms of getting the modifier names and stuff like that. Secondly, the terminology with set they're kind of just nicely with accessors, so that's why I added this alternative suggestion to the syntax. Derick Rethans 5:27 Would either of these would it be possible to extend this to get actual setters and getters later? André Rømcke 5:32 Yes. Well this is a good question actually and Larry brought this up on the mailing list. In terms of how this actually should align up to a potential future accessors, if that is ever added. Made a good point, we started to rethink this a bit so I think I might have to discuss with him and also Nikita on this, but his point was: okay so if we add this now, how will this look, if you add accessors later to your code. The mindset of thinking I was thinking, this would be an either or. Oh, this would be kind of a shorthand for what you can specify in much more details with accessors. So once you use accessors you would remove this word, because it doesn't have a meaning any more. It's basically language syntax sugar, so to say, or short hand. Derick Rethans 6:21 Besides the two that are in the RFC. Did you consider any other syntaxes that you discarded? André Rømcke 6:26 Yeah I actually had a previous RFC where I, which is linked to in this one, where I was looking into how it might look like with the new attributes, instead for directly for the read only and immutable keywords for instance. Some might find it strange that I'm using referring to both, keywords, because in some languages they mean one and the same. That first draft explored a bit on the attributes syntax how that could work. I like that a lot actually. But it made me realize that underneath, what we're actually dealing with, is asynchronous visibility. So I don't think necessarily people agree with me but I think from at least in how we expose this in a reflection API. It's much nicer to work with if it's purely about, okay, do I have write access or not, as opposed to having to deal with a checking if there is an attribute called readonly in the reflection cote, which for me, looks like code smell or. It looks like it's solving the wrong problem or, I don't know how to phrase it. Derick Rethans 7:32 And I think I agree with you there. I'll always consider attributes as something that modifies the attribute, but visibility would be something that is so inherent of a property that it doesn't really make a lot of sense to do it that way, but that's my feeling about it. I mean, it makes sense to have an attribute for JIT or no JIT because that's some contextual meaning to something else. Similarly, like if you have the ORM attributes or column attributes that people have been suggesting and mainly estate, they convey information about the properties to some other third party tool, not necessarily to PHP itself. Because PHP doesn't care about its JIT or not. It's OPCache cares about it and this would be a similar way right? André Rømcke 8:13 So in this case, it might have been better as a keyword, but I was, so I was exploring that as well and that's what's been done in the past in 2012 and then later in 2018 with that immutable RFC, and then again in 2020 with the writeonce RFC which is doing this, different semantics and focusing more on the write once but anyway those three cases were keywords. I would guess that you would prefer rather a keyboard on for this kind of things. Derick Rethans 8:42 Oh you have keywords already right, I mean private and public are keywords, or the alternatives that you have with having a modifier on when this keyword exists. I mean, is a similar idea of doing it. I mean, I haven't really thought about which ones I prefer more than the other but I would definitely prefer something like this over an attribute. André Rømcke 9:00 Either way, the benefit of doing adding a keyword or attributes for this is that you could, you could also allow them on the class level so you could allow it to be okay, unless anything else is to find then, please let this be the default but all properties be, for instance, immutable, unless they say otherwise. They would need the keyword on for not immutable or something like that. In a immutable case, if you say on the class it's immutable, the whole class should be immutable, and I'm sorry. Derick Rethans 9:27 I don't think it makes sense to that subdivided does it? André Rømcke 9:30 For me at least that distinction, which could exist between readonly and immutable. Readonly, if you look away from how C sharp is defining it and rather look to Rust. In Rust, readonly is basically the modules can read it, but not write it, your module can write to it. So that's the definition of readonly and I think there are use cases for it. But, Marco is challenging me on that so I need to find. Derick Rethans 10:00 It's often good when Marco challenges people on these things because I would trust him with making that kind of semantic choices over many other of the people that are really good working on PHP engine, for example. You slightly touch reflection, that has to be modified to support it, well either keyword or keyword modifier ,or whatever you want to call it in the end. Is there is something else that needs to be changed besides Reflection, or how does reflection, need to be changed? André Rømcke 10:29 Reflection needs to one way or the other expose, at least this RFC, expose the fact that you have now disconnected or asynchronous read and write visibility. But other than that, I don't see a big problem for reflection because, at least, in most cases, code tend to use setAccessible(). This will work as before, basically, you will get access in the instance of the reflection property. If there's any code out there that is checking isPrivate(), isProtected(), isPublic(), they might need to check if the read or write access visibility is as they expect. I haven't never had the need to use this myself but there's probably use cases out there for code dealing with unknown objects, for instance. Derick Rethans 11:20 And I saw in the RFC that you're suggesting to slightly change what isAccessible() does, but also add new methods to specifically check for the setability and getability. What are the names of these methods that you're wanting to add because I can't quite remember? André Rømcke 11:37 So the names of the reflection properties suggested currently in RFC is isSetPrivate(), isSetProtected(), isSetPublic() and then similar for the gets so isGetPrivate(), isGetProtected(), and isGetPublic(). And this would be, in addition to the current isPrivate(), isProtected(), isPublic(), which would be then have to be just a tiny bit to rather affect the visibility of both, so to say. So in the case of public both actually needs to be public for this to return true. Same with protected could also be protected:public or in the case of the other syntax protected, but public set. So, a kind of a write only property, then it would be considered protected, along with the private. Derick Rethans 12:34 So this brings me actually to point that, when seeing that your second syntax which has this set in between parenthesis, to getter didn't have anything within parenthesis. Could it make sense to have, get in parenthesis there instead of having it not marked at all? André Rømcke 12:51 It could be, but this, this this kind of brings us to maybe our downside with the syntax. And also, one, one point that Larry was making in terms of how this would fit with accessors, because it was basically proposing to reverse it. So it would be set colon private for instance, so that the order would be more in line with what might come with accessors later. So, Nikita basically proposed that we move this to a PHP 8.1 and even though I haven't updated the RFC yet I agree. So, I expect us to do more discussions around the syntax and maybe look at this as Nikita proposed on a higher level. Look at how this different needs can be aligned in a better way. Derick Rethans 13:37 You mentioned PHP 8.1, if that's the case, then this is the first RFC where it's discussing for for PHP 8.1, which is kind of nice because of course feature freeze is coming pretty soon now. And that's even closer to when this podcast comes out at the end of July. Is there any potential for breaking backwards compatibility with the addition of the syntax that you're proposing? André Rømcke 13:58 There's a few things which I haven't explored yet and defined in RFC and that is with this RFC, the probably needs., well, there definitely needs to be the definition on how property exists and these kind of things should work, and also other methods that check about. There's also isset() and unset(). Needs to be defined how this should work, so I don't know currently about any clear BC breaks per se, currently I'm only aware of it's being the case once people use it. For existing code I haven't seen any BC breaks yet, but for any code that starts to use it. You will need to adapt your, your reflection code potentially, and we'll need to look into how propertyExists(), and isset() and so on will behave. Derick Rethans 14:41 What's the feedback been so far? André Rømcke 14:43 From others in Symfony slash CMS is part of the realm, it's kind of positive. They would very much like this kind of features to easily more granularly say how this property should be should be accessed. So that's a definite need out there. There are a few language purists maybe, or whatever we should call it, that are clearly against this and allowing this kind of pragmatic access to define whatever you want. While I kind of understand it, and to some extent agree with it, I still think there's usefulness in defining the underlying semantics, on the, on the access to the properties. Yes for immutable, it's not right there yet, you could see the future where we have, if we use the words from from C sharp, it could be public:init for instance, or something like that, whatever the syntax is to define that this is public property, but it's only writable in constructor and potentially destructor and so on. So there's definitely missing a lot here to enable the full immutability semantics. Personally, I think this could be the underlying semantics of immutability, even though that would be that maybe the keyword we expose some promote more to use for for entities and grant uses. Derick Rethans 16:05 I don't have to ask you then when you think you'll be putting up this to vote if if you end up retargeting this for PHP 8.1, because you have a full year to go for that. André Rømcke 16:14 To be honest, I'm not a C developer. So, even though I think I got some pointers that you for trying to debug the session code in PHP but didn't go beyond that. So, unless anyone volunteers to code the patch for this in the next few weeks I don't see us moving this to that point right now. Derick Rethans 16:34 I mean, this could be a good starting point for exploring how to implement all the concepts that you were talking about such as immutability and readonly concepts and things like that. In that that's always we're doing it right and it's not a bad thing that takes a longer time than just the minimum discussion period of two weeks. All these complex interactions are always going to be very difficult to sort out anyway. André Rømcke 16:56 On one side of course I would like to have this as soon as possible, but I've been waiting since 2011 so I can wait another year. Derick Rethans 17:03 What is a year on a decade. Alright André would you have anything else to add? André Rømcke 17:08 No. One thing to add this, I, no matter what we end up with here there is definite need to avoid the boilerplate code, dealing with magic methods and also the performance it of using those. It would be great and simpler to get into the language without this stuff. Derick Rethans 17:25 Thank you, André for taking the time this morning to talk to me, as I said, I think this will be a good starting point for a somewhat longer discussion. André Rømcke 17:32 Thank you as well this was a great opportunity to meet you again of course, and also great to be on your podcast. It's a nice podcast to follow. Derick Rethans 17:43 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Property write/set visibility Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 62: Saner Numeric Strings

July 16, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 62: Saner Numeric Strings London, UK Thursday, July 16th 2020, 09:25 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed to make PHP's numeric string handling less complicated. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 62. Today I'm talking with George Peter Banyard about an RFC that he's proposing called saner numeric strings. Hello George, how are you this morning? George Peter Banyard 0:36 How are you; I'm doing fine. I'm George Peter Banyard. I work on PHP, and I'm currently employed by The Coding Machine to work on PHP. Derick Rethans 0:46 I actually think I have a bug swatter from The Coding Machine, which is hilarious. Huh, I can't show you that okay of course in a podcast and not on TV. But yes, I think I got it in Paris at some point at a conference there, and it's been happily getting rid of flies in my living room. Anyway, that's not what we want to talk about here today, we want to talk about the RFC that is made, what is the problem that is RFC is hoping to address? George Peter Banyard 1:09 PHP has the concept of numeric strings, which are strings which have like integers or floats encoded as a string. Mostly that would arrive when you have like a get request or post request and you take like the value of a form, which would be in a string. Issue is that PHP makes some kind of weird distinctions, and classifies numeric strings in three different categories mainly. So there are purely numeric strings, which are pure integers or pure floats, which can have an optional leading whitespace and no trailing whitespace. Derick Rethans 1:44 Does that also include like exponential numbers in there? George Peter Banyard 1:48 Yes. However trailing white spaces are not part of the numeric string specification in the PHP language. To deal with that PHP has a concept of leading numeric strings, which are strings which are numeric but like in the first few bytes, so it can be leading whitespace, integer or float, and then it can be whatever else afterwards, so it can be characters, it can be any white spaces, that will consider as a leading numeric string. The distinction is important because PHP will sometimes only accept pure numeric strings. But in some other place where we'll accept leading numeric strings. Of casts will accept whatever string possible and will try to coerce it into an integer. In weak mode, if you have a type hint. It will accept leading numeric strings, and it will emit an e_notice that a non well formed string has been encountered. When you use like a purely string string, you'll get a non numeric string encountered warning. So the main issue with that is that like strings which have a leading whitespace are considered more numeric by PHP than strings with trailing whitespaces. It is a pretty odd distinction to make. Derick Rethans 3:01 For me to get this right, the numeric string in PHP can have whitespace at the start, and then have numbers. There's a leading numeric string that can have optional whitespace in front, numbers and digits, and then rubbish. Then there's a non numeric string which never has any numbers in it. George Peter Banyard 3:22 No numbers in the beginning. "HelloWorld5" will be considered non numerical. Derick Rethans 3:26 So it's a string that doesn't start with digits. George Peter Banyard 3:29 Yes, or optional whitespace. Derick Rethans 3:31 So there are three different numeric strings, sort of. There're two, and then one that is a string that doesn't have numbers. And you mentioned that some places. These are accepted and in other places they're not. So typecast will accept both numeric strings and leading numeric strings. Where is the leading numeric string, not accepted? George Peter Banyard 3:53 If you use is_numeric call, it'll only return true on pure numeric strings. Derick Rethans 4:00 And they have whitespace ain the end? George Peter Banyard 4:02 They can only have leading white spaces. Explicit typecasting will work regardless, so even on non numeric strings, an int cast that will convert it to to zero, because that's how tight juggling works in PHP, and it will do. American leading numeric strings, it will take us to the initial leading numeric. Derick Rethans 4:27 And stripping out leading whitespace if there's any? George Peter Banyard 4:30 Strip stripping leading white spaces and stripping garbage out of the end if it's a just a leading numeric string. String to string comparison with the double equal comparison operator will perform a numeric compare comparison, only if both strings are numeric, purely numeric. Whenever you do a string to int, or float comparison, the string will get type juggled to an int or to a float, regardless of its numericness. So, we'll get non numeric string for get typecast into zero implicitly, and you'll get warnings, but it has some odd behaviour. In weak typing mode, so strict types disabled, an int typecast where an int type declaration for an argument. When you pass it an numeric string to it. If it's a leading numeric string, it will convert it was an E notice, and it will do a type error if it's a non numeric string. This can be a slight issue, if you for example you pass in a hash, it should be a string. As always, but it starts with like a digit, then it will get type juggled to an int. And it will pass the type declaration check and just like work with. Derick Rethans 5:54 And you're get a notice? George Peter Banyard 5:56 So you get a notice. Whereas like if it's, if it would be an a hash was just purely which starts with a with a character, you would get an e_warning, as in like a non well formed string like numeric string has been encountered. Derick Rethans 6:10 That sounds quite complicated. You mentioned that there's one other place where you can use numeric strings, which is in array keys. George Peter Banyard 6:21 Yes, array keys and string offsets. So array keys have a special semantic, which are like integer strings, which are separate concept and kind of same; as in, it needs to start with a nonzero digit, or be zero. For the zero index. It needs to be only digits, and that will be interpreted as an integer key. Otherwise, anything else will be interpreted as a string key, "5.5", which is a float like a numeric float string, will stay as "5.5" as the array key. This behaviour is different to string offsets. Derick Rethans 7:07 So you're saying that a string with "5.5" in it, in array key stays "5.5"? George Peter Banyard 7:15 Yes, and the same if you have a string key which is "03", you'll get a string key which is "03", it won't get evaluated as three. You can try it yourself, because it is the most weirdest behaviour, ever. I got what's quite surprised about that. Derick Rethans 7:32 You are correct, but if it's a float it gets truncated. George Peter Banyard 7:36 Yes, to five. Derick Rethans 7:38 Hey, I've learned something new here, I thought that would also truncate. George Peter Banyard 7:41 That would be kind of logical, in some sense, but it doesn't. Derick Rethans 7:46 Continuing George Peter Banyard 7:47 Array offsets have this behaviour, string keys have the more usual behaviour of using numerical, like numeric strings, as there can't be a string offset first, like it can only be like an integer. So that's why it's more lax, in some sense, it will use the usual semantics. However, if the numeric string is a float, or if it's a leading integer string, it'll emit the illegal string offset warning, but still used explicit int cast to cast it to an integer. "2str" would be cast to two, like a string index "foo" would be casted to zero, and "5.5" would be cast it to 5. It's all kind of confusing I wish doesn't follow other illegal offset behaviour for some sentence. If you try to pass an array as a as an offset you'll get a type error in PHP 8. Derick Rethans 8:55 I have to admit, I am totally getting lost here. This sounds also complicated, and that something needs to be done about this. Am I correctly understanding that this is exactly what your RFC is trying to do? George Peter Banyard 9:08 Yes, this is an attempt to bring back sanity into this whole mess. Derick Rethans 9:13 So what are you proposing here? George Peter Banyard 9:14 The proposal is to get rid of the concept of leading numeric strings, because it's mostly weird, and it's more confusing than it needs to be. To do that, numerical strings, will accept trailing white spaces. So numeric string which has leading whitespace won't be more numeric than a string with trailing white spaces. On top of that, all current, e_notices a non well formed numeric value encountered, will be changed to emit a non numeric value encountered e_warning. There's a promotion and severity in some sense as well. Should only affect purely non numeric strings, or leading numeric strings with have jibberish after the digit. For string offsets, numeric strings which correspond to well formed floating point numbers will emit the more usual string offset cast occurred warning, instead of the illegal string offset. Leading numeric strings which currently emit a non well formed numeric value and countered notice will emit the illegal string offset, and still continue to evaluate the previous value to ease the migration to PHP eight and for backwards compatibility. However, non numeric strings, which don't represent a number at all. Now throw in an illegal offset type error. This would affect our estimates operation on strings, so plus minus, multiplication, etc. Then float type declarations. So, in turn, float type declaration for internal and user land functions. Comparisons operator which considered that numeric strings with trailing white spaces weren't numeric, and so would produce false, say for example, the string "123 ", equal, equal to string " 123" will now produce true instead of false. The built in is_numeric function would return true for numeric strings which have trailing white spaces, where before it would emit false. And the plus plus, minus minus, increment, decrement operators would convert numeric strings with trailing white spaces to integers or floats and use the numerical increment instead of the alphanumeric would increment rules. Derick Rethans 11:35 You say whitespace, do you just mean the space characters or does it include like tabs and returns as well? George Peter Banyard 11:43 Tabs, new lines vertical ,spaces. Mostly what would consider white spaces. Derick Rethans 11:48 I guess there's a horizontal tab and a vertical tab and stuff like that. What's the potential for for breaking changes here because messing around with PHP's type juggling rules is always a bit tricky. What are the BC implications here? George Peter Banyard 12:05 I would expect most reasonable code to not be affected. It changes, one which is relatively minor, which is, if you, for some reason, your code needs the string to be numeric and only have leading white spaces, but no trailing white spaces, which is a pretty specific requirement. Then accepting trailing white spaces would break that code, because that would be considered a valid numeric string, whereas the code assumes that would be non non well formed, which is an odd requirement to have. That's why I don't expect it to be that big. Second one, more problematic one, is code which has liberal use of leading numeric strict. If for example you pass the DOM, an XML or a CSS file or something, and you get 2px, for example, for 2 pixel. And you just take that string, and dump it into various things and expect it to get two out of it. Sometimes you will need to now use an explicit cast to get the previous behaviour. That would be notified by you or by the by an e_notice in PHP 7.4, and it would it would inform you with a e_warning in PHP 8. Derick Rethans 13:28 Considering you get a warning ish thing in both cases it's not really a BC break, I mean it's not suddenly going to start throwing an exception, which could break your code flow for example. George Peter Banyard 13:39 Yes, and also all behaviour should be identical to PHP 7.4 and PHP 8. If there wasn't a warning before, if it was a notice, and it's been moved to a warning, the behaviour should be the same, except for like non numeric strings which sometimes will emit a type error, that's most likely a bug, were you expecting something to be an integer like and it's just pure or strict. Derick Rethans 14:07 Oh, of course for user input, we know we shouldn't casting anyway, we should use the filter extension to get to this data, does this impact the filter extension at all? George Peter Banyard 14:19 No, I don't think so. I don't think the filter extension uses the C is_numeric, is_numeric_string function. And it uses its own parsing of strings. Derick Rethans 14:30 Have you gotten any feedback about this so far? George Peter Banyard 14:33 Some feedback was to clarify some of the changes if it would affect code. Also, I had some doubt about how to handle the string offset case, which initially one of the proposals was to promote the leading number of strings to emit the warning, but also returned zero instead of returning the previous value, which would be pretty hard to detect, although they emitted a notice previously. So I've changed that again to like more in line with the behaviour, it has in PHP seven, where it just truncates the gibberish and cast it to an integer. So at least that BC concern should be removed. Derick Rethans 15:24 As I mentioned, this is all pretty hard to wrap my head around, not because you don't explain this correctly, but mostly because it's so complicated to begin with. I would probably recommend that people that listen to this podcast episode would also have a look at the RFC, because it will come with examples in the cases as well, and sometimes just looking at the examples is a lot easier than listening to the exact descriptions of strengths as parsed by the PHP engine. George Peter Banyard 15:53 Yes, which, at time can be mostly weird and nonsensical, but mostly based on Perl semantics. Derick Rethans 16:02 Sometimes we steal from Java, sometimes we steal from Rust, and sometimes some Perl it seems them. And there's nothing wrong with that. George Peter Banyard 16:10 There's nothing wrong, and in some sense, if you steal all the good things you get a better language, and sometimes you make some slight mistakes along the way. Derick Rethans 16:19 let me not start about the @@ operator. We'll keep that for another episode, maybe. George Peter Banyard 16:25 Yes. Derick Rethans 16:26 When do you think you're going to put this up for a vote? George Peter Banyard 16:29 So I started the discussion early this week. So on the 29th of June. I would expect the two weeks discussion period, because feature freezes coming up pretty soon. It needs to be voted on before and implemented into core before that. Voting should start on the 13th of July for two weeks until the 27th, which would give like another week to land stuff; to land it into core and tweak the implementation details. Derick Rethans 16:59 I'm expecting a lot more RFCs just wanting to get in, just before the deadline. George Peter Banyard 17:05 I suppose so, it's also kind of difficult because getting really tight. Derick Rethans 17:09 Okay, George. Thanks for this. Would you have anything else to add? George Peter Banyard 17:13 No, thanks for having me on the show again Derick, and I hope you have a nice evening. Derick Rethans 17:17 Thanks very much. Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Saner numeric strings Related Saner string to number comparisons RFC Related Permit trailing whitespace in numeric strings RFC Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 61: Stable Sorting

July 09, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 61: Stable Sorting London, UK Thursday, July 9th 2020, 09:24 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Stable Sorting RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 61. Today I'm talking with Nikita Popov about a rather small RFC that he's proposing called stable sorting. Hello Nikita, how are you this morning? Nikita 0:36 Hey, Derick, I'm great. How are you? Derick Rethans 0:38 Not too bad myself. Let's jump straight in here. The title of the RFC is stable sorting, what does that mean, what is stable sorting, or what is sorting stability? Nikita 0:48 Sorting stability refers to the behaviour of the sort when it comes to equal elements. And equal share means that we sort comparison function. For example, the one you pass to usort says the elements are equal, but there is still some way to distinguish them. For example, if you're sorting some objects, to take the example from the RFC, we have an array with users, and users have an age, and we use usort to only sort the users by age. Then according to the comparison callback all users with the same age are equal. But of course, the user also has other fields on which we can distinguish it. And the question is now in what order will equal elements appear. If we have a stable sort, then they will appear in the order they were originally in. So it's something not going to change. Derick Rethans 1:41 And that is not what PHP sorting mechanism currently does? Nikita 1:44 Right. PHP currently uses an unstable sort, which means that the order is simply unspecified. It will be deterministic. I mean if you take the same input array and sort it, then every time we will get the same result. But there is no well specified order or relative order of elements. There's just some order. The reason why we have this behaviour is that well there are, I would say, two, the only two sorting algorithms. There is merge sort. Which is a guaranteed n log n sort that the stable, but has the disadvantage that that requires additional memory to perform the merge step. The other side there is a quicksort, which is an average case n log n sorting algorithm and is unstable, but does not require any additional memory. And in practice, everyone uses one of these algorithms, usually with a couple of extensions on sort of merge sort. Nowadays we use timsort, but which is still based on the same underlying principle, and for quicksort, we have sort which is better than quicksort, which tries to avoid some of the bad worst case performance which quicksort can have. PHP currently uses us a quicksort, which means that our sorting results are unstable. Derick Rethans 3:07 Okay, and this RFC suggesting to change that. How would you do that? How would you modify quicksort to make it stable? Nikita 3:15 Two ways. One is to just change the sorting algorithm. So as I mentioned, the really popular stable sorting is timsort, which is used by Python by Java and probably lots of other languages at this point. And the other possibility is to stick with an unstable source. So to stick with quicksort, but to artificially enforce that the comparison function does not have, does not report equal elements that are not really equal. And we can do that by introducing an extra artificial fallback comparison. We remember the order of the elements in the original array. And as the comparison function tells us that elements are equal. You will check against this original order, which means that, okay are sort of still unstable, but because the comparison, we'll never actually report that two elements are equal unless they really equal. It doesn't matter for the result. Derick Rethans 4:16 So you're basically artificially changing the key to have the original index in the array. Nikita 4:24 That's pretty much exactly the implementation. And this is actually also how you would implement the stable sort if you'd do it in PHP code. So you would take your array and convert it into an array of pairs, where you have the original array value and the original position of the array element. Difference is just that if you do this in PHP code this is extremely extremely inefficient, in terms of memory and performance, while when we do it internally it's essentially free because we already have a little bit of unused space in each array element. We can easily store the current position. Derick Rethans 5:02 Do you think there will be much of a performance hit here? Nikita 5:04 So I expect that there is a bit of performance hit, but for typical usage, not much. For the good case where your array does not actually contain any equal elements, the overhead should be very small, something like maybe one or 2%,. If your array does contain a huge number of duplicates. Then there is more overhead, and the effect is basically that the sort performance, no longer depends on the number of duplicates you have. Previously if you had a lot of duplicates, then the sort became faster, the more duplicates you had. Well now, as you add more duplicates, the sorting performance will stay both stable. That's really the difference in performance. Derick Rethans 5:53 If you have the numbers in the RFC I'll make sure to link to them. There are possibility that is that this is going to break any code? Nikita 6:01 Yes, it could break tests. Derick Rethans 6:04 Tests, because the test's output can change because the sorting order of arrays might have changed. Nikita 6:11 Exactly. So we already had such a change in PHP seven, where we switched from a pure quicksort, to a hybrid quicksort and insertion sort, which means that effectively we have a stable source for arrays smaller than 16 elements and an unstable source for larger arrays, which is weird, weird intermediate state. Derick Rethans 6:33 Yes. Nikita 6:35 I think that one already had quite a bit of fallout for testing purposes. Hopefully this one will be a little bit smaller because most tests will work on a few elements. Those would have already been stable previously. But there is definitely going to be a little bit of fallout for unit testing. Derick Rethans 6:56 At the moment we're talking about this, the RFC's already up for voting. By the time this podcast has come out. It's pretty likely that it has been accepted for PHP eight, because I think the voting was 51 to zero or something like this. Nikita 7:10 It's 36 to zero. Derick Rethans 7:13 There you go. Thank you, Nikita for taking the time this morning to talk to me about stable sorting. Nikita 7:19 Thanks for having me. Derick Rethans 7:23 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Stable Sorting Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 60: OpenSSL CMS Support

July 02, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 60: OpenSSL CMS Support London, UK Thursday, July 2nd 2020, 09:23 BST In this episode of "PHP Internals News" I chat with Eliot Lear (Twitter, GitHub, Website) about OpenSSL CMS support, which he has contributed to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 60. Today I'm talking with Eliot Lear about adding OpenSSL CMS supports to PHP. Hello Eliot, would you please introduce yourself. Eliot Lear 0:34 Hi Derick, it's great to be here. My name is Eliot Lear, I'm a principal engineer for Cisco Systems working on IoT security. Derick Rethans 0:41 I saw somewhere on the internet, Wikipedia I believe that he also did some RFCs, not PHP RFC, but internet RFCs. Eliot Lear 0:49 That's correct. I have a few out there I'm a jack of all trades But Master of None. Derick Rethans 0:53 The one that piqued my interest was the one for the timezone database, because I added timezone support to PHP a long long time ago. Eliot Lear 1:01 That's right, there's a whole funny story about that RFC, we will have to save it for another time but there are a lot of heroes out there in the volunteer world, who keep that database up to date, and currently the they're corralled and coordinated by a lovely gentleman by the name of Paul Eggert and if you're not a member of that community it's really a wonderful contribution to make, and they need people all around the world to send an information but I guess that's not why we're here today. Derick Rethans 1:29 But I'm happy to chat about that at some other point in the future. Now today we're talking about CMS support in OpenSSL and the first time I saw CMS. I don't think that means content management system here. Eliot Lear 1:41 No, it stands for cryptographic message syntax, and it is the follow on to earlier work which people will know as PKCS#7. So it's a way in which one can transmit and receive encrypted information or just signed information. Derick Rethans 1:58 How does CMS, and PKCS#7 differ from each other. Eliot Lear 2:03 Actually not too many differences, the externally the envelope or the structure of the message is slightly better formed, and the people who worked on that at the Internet Engineering Task Force were essentially just making incremental improvements to make sure that there was good interoperability, good for email support and encrypted email, and signed email, and for other purposes as well. So it's very relatively modest but important improvements, from PKCS#7. Derick Rethans 2:39 How old are these two standards? Eliot Lear 2:42 Goodness. PKCS#7, I'm not sure actually of how old the PKCS#7 is, but CMS dates back. Gosh, probably a decade or so I'd have to go look. I'm sorry if I don't have the answer to that one, Derick Rethans 2:56 A ballpark figure works fine for me. Why would you want to use CMS over the older PKCS#7? Eliot Lear 3:02 You know, truthfully, I'm not, I'm not a cryptographer, so the reason I used it was because it was the latest and greatest thing and when you're doing this sort of work. I'm an, I'm an interdisciplinary person so what I do is I go find the experts and they tell me what to use. And believe it or not, I went and found the person who's the expert on cryptographic signatures, which is what I need. I said: What should I use? He said: You should use CMS and so that's what I did. What I ran into some troubles though, which is that some of the tooling, doesn't support CMS. So, in particular PHP didn't support CMS. So that's why I got involved in the PHP project. Derick Rethans 3:40 You are a new contributor to the PHP project. What did you think of its interactions? Eliot Lear 3:45 I had a wonderful time doing the development. There was a fair amount of coding involved, and one has to understand that the underlying code here is OpenSSL and OpenSSL's documentation for some of its interfaces could stand a little bit of improvement. I needed to do a fair amount of work and I needed a fair amount of review so I got a lot of support from Jakub particular, who looks after the OpenSSL code base, as one of the maintainers, and I really enjoyed the CI/CD integration, which allowed me to check the numerous environments that PHP runs on. I really enjoyed the community review, and I really enjoyed it even though I didn't have to really do one in my case, I did do an RFC, as part of the PHP development process, which essentially forced me to write really good documentation or at least I hope it's really good. Before all of the caller interfaces that I defined, so it was a really enjoyable experience. I really liked working with the team. Derick Rethans 4:47 That's good to hear. I think sometimes although an RFC wasn't particularly necessary here, as an RFC one particularly necessary I always find writing down the requirements that I have for my own software, first, even though this doesn't get publicized or nobody's going to review that always very useful to just clear my head and see what's going on there. Eliot Lear 5:06 Yeah, I think that's a good approach. Derick Rethans 5:07 During the review, was there a lot of feedback where you weren't quite sure, or what was the best feedback that you got during this process? Eliot Lear 5:15 Biggest issue that we had was, how to handle streaming, and we have some code in there now for streaming, but it's it's unlikely to get really heavily exercised in the way that the interfaces are defined right now. It's essentially files in/files out interface which mirrors the PKCS#7 interface. One of the future activities that I would like to take on if I can find a little bit more time, is to move away from the files in/files out interface, but rather use an in memory structure or in memory interface. So that can actually take advantage of streaming and can be more memory efficient, over time. Derick Rethans 5:56 When you say file now you actually provide a file name to the functions? Eliot Lear 6:00 That's right, you know, depending on which of the interfaces you're using, there's an encrypt, there's an encrypt call there's a decrypt call. There's a sign and a validate call, and or a verify call, and each of them has a slightly different interface, but you know if you're encrypting you need to have the destination that you're encrypting through these are all public key, you know PKI based approaches so you have to have the destination certificates, that you're sending. If you're verifying you need to have the private key to do or you need, I'm sorry you need to have the public key chain and if you're decrypting to have the private key to do all this. So, but they're all filenames that are passed and it's a bit of a limitation of the original interface in that you probably don't really want to be passing file names from most of your functions you'd rather be passing objects that are a bit better structure than that. Derick Rethans 6:53 Is the underlying OpenSSL interface similar or does that allow for streaming in general? Eliot Lear 6:59 The C API allows for streaming in such. The command line interface, it doesn't seem to me that they do any particular things with with streaming. If you look at the cryptographic interface that we that we did for CMS, mostly it is an attempt to provide the capability that you would otherwise have on the open using the OpenSSL command line interface and I think the nice thing here is that we can evolve from that point. Derick Rethans 7:26 And the progress wouldn't only be done implemented for the CMS mechanism, but also for PKCS#7, as well as others that are also available. Eliot Lear 7:35 Yes. Another area that I would like to look at, I'm not sure how easy it will be, we didn't try it this time was to try and combine the code bases because they are so close, and be a little bit more code efficient, but there are just slight enough differences in the caller interfaces between PKCS#7 and CMS that, I'm not sure I could get away with using void functions for everything I have. I might have to have a lot of switches, or conditionals in the code. But what I am interested in doing for both sets of code is, again, providing new interfaces, where instead of passing file names, you're passing memory structures of some form that can be used to stream. That's the future. Derick Rethans 8:22 I've been writing quite a bit of GO code in the last couple of months. And that interface is exactly the same, you provide file names to it, which I find kind of annoying because I'm going to have to distribute these binaries at some point. And I don't really want any other dependencies in the form of files, so I need to figure out a way how to do that without also provide those key files at some point. Eliot Lear 8:43 Indeed, that's, that's an issue, and for us right well who are web developers I did this because I was doing some web development. A lot of the stuff that I want to do. I just want to do in memory and then pass right back to the client and I don't really want to have to go to the File System. And right now, I'll have to take an extra step to go to the File System and that's alright, it's not a big deal, but it'll be a little bit more elegant when I get away from that. We'll do that you know at an appropriate time. Derick Rethans 9:11 Yes, that sounds lovely. I'm not an expert in cryptography either. I saw that the RFC mentions the X 509. How does it tie in with CMS and PKCS #7? Eliot Lear 9:21 X 509 is essentially a certificate standard. In fact, that's what really what it is. A certificate essentially has a bunch of attributes, along with a subject being one of those attributes and a signature on top of the whole structure. And the signature comes from a signer, and the signer is essentially asserting all of these attributes on behalf of whoever sent the request. X 509 certificates are, for example the core of our web authentication infrastructure. When you go to the bank online, it uses an X 509 certificate to prove to you that it is the bank that you intended to visit, that's the basis of this and CMS and PKCS#7 are structures that allow the X 509 standard to be serialized, so there's the distinguishing coding rules that are used underneath PKCS#7 and CMS, and then what you have, CMS essentially was designed as at least in part for mail transmission. So how is it that you indicate the certificate, the subject name, the content of the message. All of this information had to be formally described, and it had to be done in a way that is scalable. And the nice thing about X 509, as compared to say just using naked public keys, is with naked public keys, the verifier or the recipient has to have each individual public key, whereas with X 509, it uses the certificate hierarchy such that you only need to have the top of the chain, if you will, in order to validate a certificate. So X 509 scales, amazingly well, we see that success, all throughout the web. And so that's what CMS and PKCS#7 help support. Derick Rethans 11:24 Like I said, I've never really done enough research into this but I think it is something that many web developers should really know how that works because this comes back, not only with mail, but also with HTTPS. Eliot Lear 11:35 It's another part of the code right. So CMS isn't directly used for supporting TLS connections, there's a whole a whole set of code inside of PHP for that. Derick Rethans 11:44 Would you have anything else to add? Eliot Lear 11:46 I would say a couple of things. The basis of this work was that I was attempting to create signatures for something called manufacturer usage descriptions. The reason I got involved with PHP is that I'm doing tooling that supports an IoT protection project. And this this manufacturer usage descriptions essentially describes what the device, what an IoT device needs in terms of network access. And the purpose of using PHP and adding the code that I added was so that those descriptions could be signed, and that's why Cisco, my employer, supported my activity. Now Cisco loves giving back to the community. This was one way we could do so it's something I'm very proud of when it comes to our company. And so we're very happy to participate with the PHP project. I really enjoyed working with Derick Rethans 12:33 That's glad to hear. I'm looking forward to some other API improvements because I agree that the interfaces that the OpenSSL extension has aren't always the easiest to use and I think it's important that encryption is easy to use, because more people will use it right. Eliot Lear 12:49 I have to say, in my opinion, the encryption interfaces that we have today are still relatively immature. And not just CMS, the code that I wrote, which is really you know fresh it just got committed, but the whole category of interfaces, is something that will evolve over time and it's important that it do so because the threats are evolving over time and people need to be able to use these interfaces, and we can't all be cryptographic experts, I'm not. I just use the code but I needed to write some in order to use it in my case, but as we go on I think will enjoy richer and easier to use interfaces that normal developers can use without being experts. Derick Rethans 13:38 PHP has been going that way already a little bit because we started having a simple random interface, and in a simple way of doing hashes and verifying hashes, to make these things a lot easier because we saw that lots of people are implementing their own ways in PHP code, and pretty much messing it up because, as you say not everybody's a cryptographer. Eliot Lear 13:56 That's right. And so that's a really good thing that PHP did, because as you pointed out, it eliminates all the people who are going onto the net looking for the little snippet of code that they're going to include in PHP, whether that snippet is correct or not that's a big issue. Derick Rethans 14:11 Absolutely. And cryptography is not something that you want to get wrong. Eliot Lear 14:15 That's right, because for every line of code that you've written in this space, there's going to be somebody who's going to want to attack it, maybe several. Derick Rethans 14:23 Absolutely. Thank you, Eliot, for taking the time this morning to talk to me about CMS support. Eliot Lear 14:28 It's been my pleasure Derick, and thanks for having me on. And again, it was really enjoyable to work with the PHP team and I'm looking forward to doing more. Derick Rethans 14:38 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Add CMS Support Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 59: Named Arguments

June 25, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 59: Named Arguments London, UK Thursday, June 25th 2020, 09:22 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Named Parameter RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 59. Today I'm talking with Nikita Popov about a few RFCs that he's produced. Hello Nikita, how are you this morning? Nikita Popov 0:35 Hey Derick, I'm great. How are you? Derick Rethans 0:38 Not too bad, not too bad today. I think I made a decision to stop asking you to introduce yourself because we've done this so many times now. We have quite a few things to go through today. So let's start with the bigger one, which is the named arguments RFC. We have in PHP eight already seen quite a few changes to how PHP deals with set up and things like that we have had an argument promotion in constructors, we have the mixed type, we have union types, and now named arguments, I suppose built on top of that, again, so what are named arguments? Nikita Popov 1:07 Currently, if you're calling a function or a method you have to pass the arguments in a certain order. So in the same order in which they were declared in the function, or method declaration. And what named arguments or parameters allows you to do is to instead specify the argument names, when doing the call. Just taking the first example from the RFC, we have the array_fill function, and the array_fill function accepts three arguments. So you can call like array_fill( 0, 100, 50 ). Now, like what what does that actually mean? This function signature is not really great because you can't really tell what the meaning of this parameter is and, in which order you should be passing them. So with named parameters, the same call would be is something like: array_fill, where the start index is zero, the number is 100, and the value is 50. And that should immediately make this call, like much more understandable, because you know what the arguments mean. And this is really one of the main like motivations or benefits of having named parameters. Derick Rethans 2:20 Of course developers that use an IDE already have this information available through an IDE. But of course named arguments will also start working for people that don't have, or don't want to use an IDE at that moment. Nikita Popov 2:31 At least in PhpStorm, there is a feature where you can enable these argument labels for constants typically only. This would basically move this particular information into the language, but I should say that of course this is not the only advantage of having named parameters. So making code more self documenting is one aspect, but there are a couple couple more of them. I think one important one is that you can skip default values. So if you have a function that has many optional arguments, and you only want to say change the last one, then right now you actually have to pass all the arguments before the last one as well and you have to know: Well, what is the correct default value to pass there, even though you don't really care about it. Derick Rethans 3:19 If I remember correctly, there are a few functions in PHP's standard library, where you cannot actually replicate the default value with specifying an argument value, because they have this really complex and weird kind of behaviour. Nikita Popov 3:33 That's true, but that's something we're trying to eliminate in PHP eight mostly. Derick Rethans 3:39 And of course additional you'd never have to remember, whether in_array and array_search have needle or haystack first, which is also beneficial. Nikita Popov 3:46 That's true. Yeah. Derick Rethans 3:48 You mentioned that there are a few other benefits as well. You mentioned self documenting and the skipping of arguments, what other benefits are there? Nikita Popov 3:54 The other part is that you can also reorder the parameters. So this varies a little bit by language. In some languages you're required to still pass the arguments in the same order. They were declared, even if you're using name parameters. But for the purposes of PHP, you would allow passing them in arbitrary order. Just like you said you don't have to remember if the haystack is first, or the needle comes first. And I think one case where all of these benefits, play together particularly well, is when it comes to object construction. So you already mentioned that we have the constructor promotion RFC in PHP eight, which makes it pretty simple to declare value objects. So you just list all the available properties and their default values and types, the constructor and you're done. But when you actually instantiate the object, you still have to, their ergonomics are not particularly good, because you have to remember in which order you have to pass the parameters, don't really know which parameters which just looking at the call. And once again, you have to specify everything and you can't just skip a few of them with default values. And if you have like a constructor with maybe five or six arguments coming in, which is maybe unusual for normal methods, but I think somewhat normal for constructors in particular, then the current development experience there is just not very nice. And named parameters would essentially provide us something akin to an object initialization syntax which is available in many other languages, and which has also been proposed for PHP, previously. But you would get this just as a side effect of combining constructors and named parameters, without having to define any kind of special semantics for how object construction works, and how initializer syntax interacts with constructors and so on. Derick Rethans 5:55 That ties in again with the object ergonomics that I spoke about with Larry earlier this season as well. Nikita Popov 6:01 Yeah, I believe that this combination of ,constructor promotion and named parameters for constructors was one of the things. Derick Rethans 6:10 We've spoken a little bit about what it is. Now, how would you use this in PHP, what is the syntax for that you're proposing? Nikita Popov 6:18 I mean syntax is always bike shedding question. The particular one, I am proposing for now is to save the parameter name as literal, so no dollar in front of it or something. And the colon and the value you want to pass. Derick Rethans 6:35 Is there any precedence for this syntax already, either in PHP or outside of PHP? Nikita Popov 6:41 In PHP, not really. I mean, PHP, we usually use the double arrow to have any kind of key value mapping. This is sort of key value mapping. In other languages, yes the syntax does exist. I'm actually not sure which languages exactly use it. Probably C sharp and Kotlin. Python uses just an equal sign. Well, there are a couple who use it. I actually initially use the double arrow syntax because it's more familiar with PHP, but I found that it's, there's not really read as nicely. And I also have some ideas on how we can, like, integrate this colon syntax, into the language in a more consistent way. Derick Rethans 7:27 I think I saw in the RFC that the only said the only way how you can do the keys is by literal and not by a variable. Nikita Popov 7:34 That's right. This is mainly just to avoid confusion. Well if you allow specifying a variable, then the question is, well, is this variable just the parameter name? Because I mean the signature, you also write this as a variable, or is it the variable that contains the parameter name like variable variables in PHP. So I think to sidestep that confusion, we just allow identifiers, but you can still use a variable parameter names from the argument unpacking syntax. Derick Rethans 8:04 How does that work? Nikita Popov 8:05 So PHP supports the three dots, the ellipsis operator, both in the function declaration, and for function calls. The declaration that just means collect all the trailing arguments. And the call, at the call, means that you get an array, and the elements of this array should be interpreted as function arguments. And parameters extend that by also allowing array keys. And if you unpack an array with string keys then those will be interpreted as parameter names, and we'll use the usual named parameters passing semantics. Derick Rethans 8:47 Interesting. I actually missed that, while reading the RFC. To be fair, I skimmed it, not really read tit. Yeah it's good to see that actually. Now people currently use positional arguments and not named arguments. How would these two interact. Nikita Popov 9:01 Mostly, the named parameters are just syntax for positional arguments, so we perform an internal transformation to convert named parameters into positional parameters. As far as both the engine is concerned and the callee is concerned. They don't really know about parameters that's all. They see usual positional call where all the missing arguments have been filled in with default values. I think the only part to watch out for there is exactly this case of variadics, because previously, the variadic parameter could only contain a list of arguments, and now it can also have string keys, or like left over named parameters. So which did not have a matching argument in the function signature so both will now get collected to the variadic parameter. Think that's like the only case where I know that the calling convention really changes for the recipient of the arguments. Derick Rethans 10:02 Because otherwise got a normal array they now get a bunch of things with potentially having keys in there as well. What would happen if I specify a named argument by name and also include it into the variadics? Nikita Popov 10:15 So generally the rule is always you can pass a parameter at most once you can have the situation where you first pass some positional arguments, and then you pass named arguments. If you do that this named argument cannot clash with the previous past positional argument, if you run in this kind of situation we will always throw an exception at that point. So you're not allowed to overwrite the previous argument, or something like that. Derick Rethans 10:42 Same would work that if a method would collect named arguments and also have the variadics array. In case you specify more arguments then the function would take. And, in the variadics you'd have that name again that would have already clashed before it even gets turned into variadic. Are the names that she gives to named arguments are case sensitive or case insensitive? Nikita Popov 11:04 They are case sensitive. Because the parameters you specify in the function are just variables and variables in PHP are case sensitive as well. Derick Rethans 11:14 At the moment if you inherit a method in a inheriting class, then it doesn't particularly matter what the names of these method arguments are. When you get now named arguments, is this going to change, because at the moment PHP doesn't enforce that the names of inheriting methods are of course clashing, or the same as the ones that are overriding in the parent class? Nikita Popov 11:37 This is one of the bigger open questions we have. The problem is that if you call a method with the names from the parent class, and the child class change them, then you'll get an error because this named parameter just doesn't exist in the child class. And there are a couple of ways to approach that one is to forbid during inheritance, any kind of parameter name changes, which would be a fairly significant backwards break because well, it never mattered in the past and based on some cursory analysis, this is like parameter name changes, somewhat common in code right now. The other possibility is to just ignore this issue, expect that a lot of code is never going to use name parameters. So using the parameters only makes sense with some types of methods. If you have a method that only accepts one argument can be pretty sure that no one's going to call it that has a name parameter, and there is the option of just ignoring this issue and fixing it as it comes up, more or less. Which is maybe not the most principled approach. But if we look at other languages that do make heavy use of parameters for example like Python. And we see that they also just ignore the problem. So it looks like in practice this does work out. Of course, a significant difference there is that Python has had in parameters for a long time already. We will be retrofitting them on an old language. So the situation is somewhat different and probably rather than more dangerous for us. Derick Rethans 13:14 This is something of course that static analysis tools can check for quite easily and I would argue that they probably should start doing that as well. Nikita Popov 13:22 This this right, so this is both something easy to check for, and also easy to automatically fix. Derick Rethans 13:28 Except that you need to choose which one is the correct name, of course. Nikita Popov 13:32 Yeah, that's right. But there is one more possibility, which is to allow the parameter names from both the parent method, and the child method. This will be like more or less a transparent way to fix that issue. The only problem you can run into this if both the parent method and the child method use the same parameter name but in a different position. If we would go with this option then we say that only in this particular case where parameter name is reused but different position that would become an inheritance error. Derick Rethans 14:04 I quite like that actually, because that's a pragmatic approach isn't it? Nikita Popov 14:07 I also quite like it, maybe it's just technically a bit problematic. Derick Rethans 14:11 I can already imagine that if this gets accepted for PHP eight, which of course not sure at the moment, that Xdebug is going to have to show the variadics already with the names array elements which of course it doesn't do yet because it has no notion of. But that's good to know to have a heads up on these things. PHP eight has already seen quite a lot of work for internal methods to get their names properly, recorded as well, so that types of stubs that you have already been working on. How does named arguments tie in with this? Nikita Popov 14:38 The actual named arguments proposal is already pretty old. It dates back to PHP 5.6, I think, and one of the open questions since then was how we handle internal control functions, because they don't really have a notion of default values. We have optional parameters, but the default value is not known to the engine, it's only known to the implementation. There are kind of ways to work around that. They are not really safe, so they will work for most functions, but for some which who like argument context, we might end up just crashing if this function is used with named parameters and particularly weird way. One of the nice things in PHP eight is that thanks to the stub effort we actually have default values for functions available as collectible meta data so it's available for reflection, and we will would also be able to use this for named parameters. If an internal function parameter has been skipped, we can essentially fetch it from reflection and fill in the value, the same way we would do for for normal user functions. The issue there is that this only works if there are stubs available. This works for all of our internal functions. I mean, not internal but bundled functions for PHP, but it will not work out of the box with old extensions. So it will mostly work, just this kind of parameter skipping is not going to work. So it will give you an error like okay we don't have default information for this function so you can't call it like this. Derick Rethans 16:17 There's this common myth saying that reflection is actually a very slow thing, you should never use this in your code. Is this going to be a concern for using reflection information this way for internal functions? Nikita Popov 16:29 Well, I mean the self like you will be directly using reflection, but internal API's that do the same thing. There is a performance concern here because we store the default values, not as values but as strings. So, in the worst case we actually have to parse those strings, convert them into a syntax tree, validate the syntax tree. That's all. That's of course slow, but it's not like we can't add a bit of caching in there to make sure this only happens once, at which point the problem should be avoided. Derick Rethans 17:02 Especially when you use things like opcache. Nikita Popov 17:04 I should say that I do expect name parameter calls to be generally slower than positional calls, so maybe in super performance critical code you would stick with the positional arguments. Derick Rethans 17:16 I mean it would work perfectly well so far object construction still right? Nikita Popov 17:19 For object construction the real cost is really in the object allocations so and so. Derick Rethans 17:24 With the introduction of named arguments aren't going to be any BC breaks, potentially? Nikita Popov 17:29 There are not going to be any direct BC breaks, but there are of course some concerns. The first one is the change I mentioned about the variadics. That variadics can now have string keys. But I should clarify what I mean by: no, no, BC breaks. If you don't use named arguments than nothing is going to break. But of course, if named arguments are used with code that did not expect them, then we can run into some issues. So that's one of the issues. And the other one is more of a like long term maintenance concern that if we introduce named parameters, then those parameters become significant to the API, which means you cannot rename parameter names in minor versions of a library if you're semver compatible. Because, you might be breaking some codes on using those parameter names. And I think one of the biggest concerns that has come up in the discussion is that this is a significant increase in the API burden for open source libraries. Derick Rethans 18:34 Because now suddenly, they have to think about the names of the arguments to all their methods as well, right. Nikita Popov 18:39 So I think, like, the merits of this proposal, mostly comes down to how much additional burden does this impose on people maintaining libraries versus how much like ergonomics improvements that we get out of the feature for everyone else. One more thing to consider is that named parameters really change how you design APIs or what APIs you can reasonably design. So right now if you have a method with, for example, three boolean arguments, that would be like a really horrible method, because you call it like, true, true, false, like what does this mean? If you have name parameters, and you have the same three boolean arguments, then it's not really a problem any more. So you can, of course, you say, what the argument means and you can leave out arguments that are that you don't want to modify. Derick Rethans 19:30 You mentioned that this RFC is quite old already. Do you think this will make it into PHP eight, as we're getting closer and closer to feature freeze, we're not quite there yet we have another month or so to go. Do you think it's ready enough to throw to the lions, so to speak? Nikita Popov 19:46 So I think I will at least give it a try, because I do think that PHP eight is a good target for such a change. Even though it nominally does not break backwards compatibility, it does have a very significant impact in practice, so it wouldn't be good to put this on a major version. And additionally, we also did all this work on stubs in PHP eight with this it'll also fits in very well. Oh, and finally, one thing I didn't mention before is that we get attributes in PHP eight. And attributes, firstly, replace the existing Doctrine annocation system, which already supports named parameter. For all the code that is now going to migrate from Doctrine Annotations to PHP Attributes, it would be helpful if we had named parameters, because it would make the migration a lot more straightforward, because you don't also have to change the meaning of the arguments at the same time. Derick Rethans 20:51 I'm curious to see what the reception of this will be, especially when it is going to be voted for. Nikita Popov 20:57 Yeah me as well. I never did get this to voting, the last time around, but we should at least get a vote this time and well if it doesn't go through then there is always next time. Derick Rethans 21:10 there's always next time yes. Okay Nikita Thank you for taking the time this morning to talk to me about named arguments. Nikita Popov 21:17 Thanks for having me Derick. Derick Rethans 21:20 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Named Parameters Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0