This is the 'PHP Internals News' podcast, where we discuss the latest PHP news, implementations, and issues with PHP internals developers and other guests.
Similar Podcasts
The Cynical Developer
A UK based Technology and Software Developer Podcast that helps you to improve your development knowledge and career,
through explaining the latest and greatest in development technology and providing you with what you need to succeed as a developer.
Elixir Outlaws
Elixir Outlaws is an informal discussion about interesting things happening in Elixir. Our goal is to capture the spirit of a conference hallway discussion in a podcast.
ThunderCast
An inside look at the making of Mozilla Thunderbird, and community-driven conversations with our friends in the open-source software space.
PHP Internals News: Episode 88: Pure Intersection Types
PHP Internals News: Episode 88: Pure Intersection Types London, UK Thursday, June 10th 2021, 09:16 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Pure Intersection Types" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 88. Today I'm talking with George Peter Banyard about pure intersection types. George, could you please introduce yourself? George Peter Banyard 0:30 Hello, my name is George Peter Banyard. I work on PHP code development in my free time. And on the PHP Docs. Derick Rethans 0:36 This RFC is about intersection types. What are intersection types? George Peter Banyard 0:40 I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want X and Y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. So traversable and countable. Currently, you can do intersection types in very hacky ways. So you can either create a new interface which extends both traversable and countable, but then all the classes that you want to be using this fashion, you need to make them implement the interface, which might not be possible if you using a library or other things like that. The other very hacky way of doing it is using reference and typed properties. You assign two typed properties by reference, one being traversable, one being countable, and then your actual property, you type alias reference it, with both of these properties. And then my PHP will check: does the property respect type A those reference? If yes, move to the next one. It doesn't respect type B, which basically gives you intersection types. Derick Rethans 1:44 Yeah, I saw that in the RFC. And I was wondering like, well, people actually do that? George Peter Banyard 1:49 The only reason I know that is because of Nikita's slide. Derick Rethans 1:51 The thing is, if it is possible, people will do it, right. And that's how that works. George Peter Banyard 1:56 Yeah, most of the times. Derick Rethans 1:57 The RFC isn't actually called intersection types. It's called pure intersection types. What does the word pure do here? George Peter Banyard 2:05 So the word pure here is not very semantic. But it's more that you cannot mix union types and intersection types together. The reasons for it are mostly technical. One reason is how do you mix and match intersection types and union types? One way is to have like union types take precedence over intersection types, but some people don't like that and want to explicit it grouping all the time. So you need to do parentheses, A intersection B, close parentheses, pipe for the union, and then the other type. But I think the main reason is mostly the variance, like the variance checks for inheritance are already kind of complicated and kind of mind boggling. Derick Rethans 2:44 I'm sure we'll get into the variance rules in a moment. What is it actually what you're proposing to add here. What is the syntax, for example? George Peter Banyard 2:52 So the syntax is any class type with an ampersand, and any other class type gives you an intersection type, which is the usual way of doing and. Derick Rethans 3:01 When you say class types, do you also mean interfaces? George Peter Banyard 3:04 Yes, PHP has a concept of class types, which are mostly any class in any interface. There's also a weird exception where parent and self are considered class types, but those are not allowed. Derick Rethans 3:20 Okay, so it's just the classes that you've defined and the class that are part of the language but not a special keywords, self and parent and static, I suppose? George Peter Banyard 3:28 Yes, the reason for that is standard types are not allowed to be part of an intersection, because nothing can be an integer and a string at the same time. Now, there are some of the built in types, which can be kind of true. You could have a callable, which is a string, because callables can be arrays, or can be a closure. But that's like very weird and not very great. The other one is iterable. If when you expand that out, you get redundant types, which we can talk about later. And the final thing is parent, self, and static, just makes for some very weird design questions, in my opinion, like, if you ask for something to be an intersection with itself, you basically can only enforce conditions on subclasses. You have a class and you say: Oh, I want it to return self, but also be countable for some reason, but I'm not countable. So if you extend me, then you need to be countable, but I'm not. So it's very weird. parent has kind of the very same weird semantics where you can ask a parent, but it's like, if the base class doesn't support it, and you ask for a parent to be an intersection, then you basically need the child to implement the interface and then a child to return the first child. If you do that main question. Why? Because I don't see any good reasons to do it. And it just makes everything harder. Derick Rethans 4:40 You've only added for the sake of completeness instead of it being useful. Let's move on birds. You've mentioned which types are supported, which is class names and interface names. You already hinted a little bit at redundant types. What are redundant types? George Peter Banyard 4:56 Currently, PHP already does that with union types. If you repeat the type twice in a union, you'll get a compile error. This only affects compiled time known aliases. If you use a use statement, then PHP knows that you basically using the same type. However you use a runtime alias, then it can't detect that. Derick Rethans 5:13 A runtime alias, what's that? George Peter Banyard 5:15 So if you use the function class_alias. Derick Rethans 5:16 It's new to me! George Peter Banyard 5:18 it technically exists. It also doesn't guarantee basically that the type is minimal, because it can only see those was in its own file. For example, if you say I want A and B, but B is a child class of A, then the intersection basically resolves to only B. But you can only know that at runtime if classes are defined in different files. So the type isn't minimal. But if you do redundant types, basically, it's a easy way to check if you might be typing a bug. Derick Rethans 5:46 You try to do your best to warn people about that. But you never know for certain. George Peter Banyard 5:51 You never know for certain because PHP doesn't compile everything into like one big program like in check. Static analyser can help for that. Derick Rethans 5:59 Let's talk a little bit about technical aspects, because I recommend that implementing intersection types are quite different from implementing union types. What kind of hacks that you have to make in a parser and compiler for this? George Peter Banyard 6:11 Our parser has being very weird. The parsing syntax should be the same as union types. So I just copy pasted what Nikita did. I tried it. It worked for return types without an issue. It didn't work with argument types, because bison, which is the tool which generates our parser, was giving a shift reduce conflict, which basically tells: Oh, I got two possible states I can go in, and I don't know which branch I need to go, because the PHP parser only does one look ahead. Because it was conflicting, the ampersand, either for the intersection type or for to mark a reference. Normally, if the paster is more developed, or does more look ahead, it is not a conflict. And it shouldn't be. Ilia managed to came up with this ingenious idea, which is just redefine the ampersand token twice and have very complicated names, and just use them in different contexts. And bison just: now I have no issue. It is the same token, it is the same character. Now that you have two different tokens it manages to disambiguate, like it's shift produce. So that's a very weird. Derick Rethans 7:17 I'll have a look at what that actually does, because I'm curious now myself. Beyond the parser, I think the biggest and most complicated part of this is implementing the variance rules for these intersection types. Can you give a short summary of what a variance rules are, and potentially how you've actually implemented them? George Peter Banyard 7:38 Since PHP seven point four, return types and up covariant, and parameter types are contravariant. Covariant means you can like restrict, we can be more specific. And contravariance means you can be broader or like more generic. Union types already gives some interesting covariance implications. Usually, you would think, well, a union is always broader than a single type, you say: Oh, I want either a traversable or accountable, it seems that you're expanding the type sphere. However, a single type can have as a subtype, a union type. For example, you say,:Oh, my base type is a Class A, and I have two child classes, which are B and C. I can type covariantly that I want either B or C, because B or C is more specific than just A. That's what union types over there allows you to do. And the way how it's implemented. And how to check for that is you traverse the list of child types, and check that the child type is an instance of at least one of the parents types. An intersection by virtue of you adding constraints on the type itself will always be more specific than just a single type. If you say: Oh, I want a class A, then more specifically, so I want something of class A and I want it to be countable. So you're already restrict this, which gives some very interesting implications, meaning that a child type can have more types attached to itself than a parent type. That's mostly due how PHP implements its type system, to make the distinctions, basically, I've added the flag, which is either this is a union, meaning that you need to check it is part of one, or it's an intersection. The thing with intersection types is that you need to reverse the order in how you check the types. So you basically need to check that the parent is at least an instance of one of the child types, but not that none of the child types is a super type of the parent type. Let's say you have class C, which extends Class B and Class B extends Class A. If I say let's say my base type is B to any function, and I give something which is a intersection T, any interface, this would not be a valid subtyping relation to underneath B. Because if you looked it was a Venn diagram in some sense, you've got A which is this massive sphere, you've got B which is inside it, and C which is inside it. A intersection something intersects the whole of A with something else, which might also intersect with B in a subset, but it is wider than just B, which means like the whole variance is very complicated in how you check it because you can't really reuse the same loop. Derick Rethans 10:13 I can't imagine how much more complicated this gets when you have both intersection and union types in the same return type or parameter argument type. George Peter Banyard 10:22 One of the primary reasons why it's currently not in the RFC, because it is already mind boggling. And although I think it shouldn't be that hard to like, add support for it down the line, because I've already split it mostly up so it should be easy to check: Oh, is this an intersection? Is this a union? And then you need to branch. Derick Rethans 10:42 Luckily because standard types aren't included here, you also don't really have to think about coercive mode and strict mode for these types. Because that's simply not a thing. George Peter Banyard 10:50 That's very convenient. Derick Rethans 10:52 Is the future scope to this RFC? George Peter Banyard 10:54 The obvious future scope is what I call composite types, is you have unions and intersections available in the same type. The main issue is mostly variance, because it's already complicated, adding more scope to it, it's going to make the variance go even harder. I think with most programming languages, the variance code is always complicated to read. While I was researching some of it, I managed to hit a couple of failures, which where with I think was Julia and the research paper I was it was just like focusing on a specific subset. And like, basically proving that it is correct. It's not a very big field. Professors at Imperial, which I've talked to, have been kind of helpful with giving some pointers. They mostly work with basically proper languages or compiled languages, which have this whole other set of implications. Apparently, they have like a bunch of issues about how you normalize the types like in an economical form, to make it easier to check. Which is probably one of the problems that will need to be addressed, when you get like such a intersection and union type. First, you normalize it to some canonical form, and then you work with it. But then the second issue is like how do you want the composite types to actually be? Is it oh, you have got parentheses when you want to mix and match? Or can you use like union precedence? I've heard both opinions. Basically, some people are very dead against using Union as a precedent. Derick Rethans 12:14 My question is going to be, is this actually something people would use a lot? George Peter Banyard 12:21 I don't think it would be used a ton. The moment you want to use it, it is very useful. One example is with the PSRs, the HTTP interfaces. Or if you want the link interface. Combining these multiple things gets it convenient. One of the reasons why I personally wanted as well, it's for streams. So currently, streams don't have any interface, don't have any classes. PHP basically internally checks when you call like certain string methods. For example, if you try to seek and you provide a user stream, it basically checks if you implement a seek method, which should be an interface. But you can't currently do that. Ideally, you would want to stream maybe like a base class, instead of having like a seekable stream, and rewindabe stream, or things like that. You basically just have interfaces. And then like if somebody wants a specific type of stream, just like a stream, which is seekable, which is rewindable. And other things. We already have that in SPL because there's an iterator. And we have a seekable iterator interface, which basically just ask: Oh, this is there's a seek method. I think it depends how you program. So if you separate the many things into interfaces, then you'll probably use intersections types a lot. If you use a maybe a more traditional PHP code base, which uses union types a lot. Union types are like going to be easier. And you want to reduce that. Derick Rethans 13:32 Would you think that lots of people already use union types because it's pretty new as well. Isn't it? George Peter Banyard 13:38 Union types are being implemented in various different libraries. PSRs are updating the interfaces to use union types. One use case, I also have a special method, which was taken the date, it takes a union of like a DateTime interface, a string or an integer. Although intersections types are really new, you hear people when union types were being introduced, you heard people saying, I would promote bad cleaning habits, you shouldn't have one specific type. And if you're using a union, you have a design issue. And I had many people complaining to me why and intersection types of see? Why they haven't intersection types being introduced first, because intersection types are more useful. But then you see other people telling us like, I don't see the point in intersection types. Why would you use an intersection type, just use your concrete class, because that's what you're going to type anyway. Derick Rethans 14:21 I can give you a reason why union types have implemented first, over intersection types, I think, which is that it's easier to implement. George Peter Banyard 14:28 It's easier to implement. And it's more useful for PHP as a whole, because PHP functions accepts a union or return a union. Functions return false for error states instead of null. It makes sense why union types were introduced first, because they are mostly more useful within the scope of what PHP does. Derick Rethans 14:46 Do you think you have anything else to add about intersection types? At the moment, it's already up for voting, when is that supposed to end? George Peter Banyard 14:54 So the vote is meant to end on the 17th of June. Derick Rethans 14:57 At the moment I see there's 15 votes for and two against so it's looking good. What's been your most pushback on this? If there was any at all? George Peter Banyard 15:05 Mostly: I don't see the point in it. However, I do think proper reasons why you don't want it, compared to like some other features where it's more like have thoughts on what you think design wise. But it is undeniable that you you add complexity to the variance. And to the variance check. It is already kind of complicated. I have like a hard time reading it initially. There's the whole parser hackery thing, which is kind of not great. It's probably just because we use like a restricted parser because it's faster and more efficient. Derick Rethans 15:36 I think I spoke with Nikita about parsers some time ago and what the difference between them were. If I remember which episode it was all the to the show notes. George Peter Banyard 15:44 And I think the last reason against it is that it only accepts pure intersections. You could argue that, well, if you're adding intersections, you should add the whole feature set. It might impact the implementation of type aliases, because if you type alias T to be a union of A and B, and then you use type T in an intersection, you basically get a mixture of unions and intersections, that you need to be able to work with. The crux of this whole feature is the variance implementation. And being able to rationalize the variance implementation and been to extend it, I think it's the hardest bit. Derick Rethans 16:18 I guess the next thing still missing would be type aliases, right? Like names for types, which you can't define just yet, which I think you also mentioned in the RFC is future scope. George Peter Banyard 16:29 Yeah. Derick Rethans 16:30 Thank you, George, for taking the time today to talk to me about pure intersection types. George Peter Banyard 16:36 Thanks for having me on the show. Derick Rethans 16:41 Thank you for listening to this installment of PHP internals news, the podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Pure Intersection Types Episode #66: Namespace Token, and Parsing PHP GLR Parser LALR(1) Parser Iter Library Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 87: Deprecating Ticks
PHP Internals News: Episode 87: Deprecating Ticks London, UK Thursday, June 3rd 2021, 09:15 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Deprecating Ticks" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick, welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 87. Today I'm talking with Nikita Popov about a much smaller RFC this time: Deprecating Ticks. Nikita, would you please introduce yourself. Nikita Popov 0:34 Hi Derick, I'm Nikita, and I'm working on PHP core development on behalf of JetBrains. Derick Rethans 0:40 Let's jump straight into what this RFC is about, and that's the word ticks. What are ticks? Nikita Popov 0:46 Ticks are a declare directive,. You write declare ticks equals one at the top of your file, and then PHP we'll call a tick function after every statement execution. Or if you write ticks equals two, then as we'll call it the function after every two statement executions. Derick Rethans 1:05 Do you have to specify which function that calls? Nikita Popov 1:08 Of course, so there is also a register tick function and unregister tick function and that's how you specify the function that should be called rather the functions. Derick Rethans 1:17 How does this work, historically, because the RFC talks about the change being made in PHP seven? Nikita Popov 1:22 Technically ticks work by introducing an opcode after every statement that calls the tick function depending on current count. The difference that was introduced in PHP seven is to what the tick declaration applies. The way PHP language semantics are supposed to work, is that declare directives are always local. The same way that strict types, only applies to a single file, ticks should also only apply to a single file. Prior to PHP seven, it didn't work out way. So if you had declare ticks, somewhere in your file, it would just enable ticks from that point forward. If you included the different file or even if the autoloader was triggered and included a different file that one would also make use of ticks. That was fixed in PHP seven, so now it is actually file local, but that also means that the ticks functionality at that point behaviour became, like, not very useful. Because usually if you want to use tics you actually want them to apply it to your whole codebase. There are ways around that. I'm afraid to say that people have approached me after this RFC and told me that they actually do that. The way around that is to register a stream wrapper. It's possible in PHP to unregister the file stream wrapper and register your own one, and then it's possible to intercept all the file includes and rewrite the file contents to include the declare ticks at the top of the file. I do use that general mechanism for real things in other places, but apparently people actually use that to like instrument, a whole application with ticks, and essentially restore the behaviour we had in PHP 5. Derick Rethans 3:03 What was the intended use case for ticks to begin with? Nikita Popov 3:07 Well I'm not sure what was the intended use case, but at least it was the main use case, and that's signal handling. In the PCNTL extension allows you to register a signal handler, and when the signal arrives, we can't just directly call that signal handler, because signals are only allowed to call functions without that our async signal safe. Which excludes things like memory allocation, and a lot of other things that PHP uses. What we do instead is we only set the flag that okay signal has arrived and then we have to actually run the signal handler at some later point in time. In PHP five, that worked using ticks. You declare ticks, and the PCNTL extension registered the tick handler, and then after this flag was set, it would execute your callback on the next tick. In PHP seven, an attentive mechanism was introduced, that is based on virtual machine interrupts. Those were originally introduced for time-out handling, because there we have a similar problem, that when timeout arrives, we might be in some kind of inconsistent state, like the middle of the allocator right now, and if we just bail out at that point, we are likely to see crashes down the road. So that was a significant problem in PHP five. PHP seven changed that. We now set an interrupt flag on timeout, and then the virtual machine checks this flag at certain points. The interrupt flag is not checked after every instruction, but only, like, just often enough to make sure that it's checked, at some point. So that you can't like go in an infinite loop, that ends up never checking. These points are basically function calls, and jumps that go higher up in the function, PCNTL signals can now use the same mechanism. If you call PCNTL async signals true, then those will also set the interrupt flag, and execute the signal handler on the next opportunity. The next time the interrupt flag is checked. The nice thing about that is that it's essentially free. I mean we already, we already have to do these checks for the interrupt like anyway, adding the handling for PCNTL signals doesn't add any cost on top. Unlike ticks, which have to be like executed on every instruction or at least regularly, and that does add significant cost. Derick Rethans 5:28 Execution time itself because it's an opcode that needs to be executed. Nikita Popov 5:32 Exactly. Derick Rethans 5:33 So what are you proposing to do but the ticks in PHP eight one then? Nikita Popov 5:36 I want to deprecate that. So both the declared directive itself, and the register tick function, unregister tick function. Derick Rethans 5:44 How could users emulate the same behaviour as ticks allows them to do so now? Nikita Popov 5:49 That's a good question. As I mentioned, if the use case is, use case of ticks was signal handling, then by using async symbols. If it was something else, then you have a problem. My assumption when writing this RFC was basically that signal handling was really the main remaining use case of ticks, because other use cases require this kind of you know stream wrapper instrumentation, and I didn't expect that people will be crazy enough to use something like that in production. Derick Rethans 6:21 Hopefully they catch these rewritten files? Nikita Popov 6:23 Probably yeah. I think it's possible to make this integrate with opcache. If you use it for other purposes, then, I don't think there is a really good replacement. So I think what they use it for is some kind of well instrumentation, so profiling, memory profiling, for example, and the alternative there of course is to use a tool that is appropriate for that job, for example, Xdebug contains a profiler, but of course it is not a production profiler, but I think there are also production profilers. Derick Rethans 6:54 As far as I know all the production or APM solutions. They do this on their own without having to use sticks. They don't need any user land modifications. Nikita Popov 7:03 Yeah, definitely. All the APM solutions support this, they use internal handlers. Derick Rethans 7:08 Because it's actually removing functionalities that some people use, what's the reaction been to removing this functionality? Nikita Popov 7:14 Well on the mailing list at least positive, but as I mentioned at least some people have like pointed out on the pull request that they are using the functionality. Derick Rethans 7:23 Enough in such a way to sway for not deprecating them? What is the benefits of getting rid of ticks, if you don't use them? Nikita Popov 7:31 That's, I think the thing, that there is not really a big benefit to getting rid of them. Like they don't add a lot of technical complexity to the engine. They're pretty simple in that sense. I haven't seen those responses. I'm kind of rolling a bit unsure if we should really remove them, because you could argue that well they don't really hurt anyone. I do have to say that I think all the things that people use sticks for, all the cases I have heard about, and all of those cases ticks are not the right way to solve the problem. They are not the right way to solve the signal handler problem, they are not the right way to solve the profiling problem. And the other one I heard is also they're not the right way to solve the heartbeat problem, to make sure a service stays connected. While people do use them I think they use them for questionable purposes. Derick Rethans 8:24 Developers, if they're using something to rewrite the PHP file to introduce ticks, they can also technically rewrite a file to introduce calls to their own functions, after every statement. Nikita Popov 8:34 Yes, I actually have a very nice PHP fuzzing project that rewrites PHP files to introduce instrumentation functions at certain points. That needs a lot more control than ticks, because it's interested in branching statements in particular. That is definitely also possible, but it's kind of even more crazy than just adding ticks. If you're doing it like this, I think, if we want to keep ticks, then we should change ticks from a declare directive to a ini_set, because this kind of rewriting of files to introduce takes that's like not a great solution. On the other hand, that does mean that if you are, I don't know a library, implementing some code and expecting that, you know, it just runs normally, then someone can with by enabling an ini setting will suddenly run code in the middle of your library file that's like essentially any point. So enabling ticks us a major behaviour change, that's something we really don't like to have in ini settings which is I guess also, why does it declare in the first place, because that limits the scope. And you have to go out of your way if you want to not limit it using this rewriting hack. So I'm not really sure ultimately what to do here. Derick Rethans 9:44 Are you thinking of bringing this up for vote before PHP eight dot one's feature freeze? Nikita Popov 9:49 If I decide to go for it, then definitely before. I'm just not completely sure on this topic yet. Derick Rethans 9:55 it'd be interesting to, to hear what other people think about removing this. I have no opinion about this. Other features I do but in this case, I'm happy with them being there, I'm happy with them not being there, because it's something I'm using myself. In any case, thank you for going through this RFC with me today, and we'll see what happens. Nikita Popov 10:14 Thanks for having me, Derick. Derick Rethans 10:18 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug and debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Deprecating Ticks Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 86: Property Accessors
PHP Internals News: Episode 86: Property Accessors London, UK Thursday, May 27th 2021, 09:14 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Property Accessors" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 86. Today I'm talking with Nikita Popov about his massive property excesses RFC. Nikita, would you please introduce yourself? Nikita Popov 0:32 Hi Derick, I'm Nikita, and I do work on PHP core development, on behalf of JetBrains. Derick Rethans 0:39 This is probably the largest RFC I've seen in a while. What in one sentence, are you proposing to add to PHP here? Nikita Popov 0:46 I would say it's an alternative to magic get and set, just for one specific property instead of all of them. That's the technical side. Maybe I should say something about the like motivation behind it, which is that since PHP seven four, we have type properties, that at least for me personally with that feature, the need to have this typical pattern of private property for storage, plus a public getter and setter methods, the main motivation for that has kind of gone away, because we can now use types to enforce any contracts on value. And now these getter and setter methods most if you like boilerplate. So the idea with accessors, at least my idea with accessors is that you really shouldn't use them. You should just have them as a backup option. You declare a public property in your class, and then maybe later, years later, it turns out that okay, that property actually requires additional validation. And right now if you have a public property, then you don't really have a good way of introducing that. Only way is to either break the API contract by converting the property into getter/setter methods where you can introduce arbitrary code, or by using magic get/set, which is definitely possible and persist the API contract, it's just fairly ugly. Derick Rethans 2:09 You changes the public property that people could read into a private one. And because it's private, the set and get metric methods are being called. Nikita Popov 2:18 Exactly. Derick Rethans 2:19 This RFC is titled Property accesses, how do these improve on the situation? Nikita Popov 2:24 So I think there are really two fairly orthogonal parts to this RFC. The first part is implicit accesses that don't have any custom behaviour, and just allow controlling the behaviour of properties a bit more precisely. In particular, the most important part is probably the asymmetric visibility, where you have a property that's publicly readable, but can only be set from within the class. So public read/ private write. I think that's a, maybe the most common requirement. The second part is where you can actually introduce some custom behaviour. So where you can say that okay, the get behaviour for this property looks like this, and the set behaviour, it looks like this. Which is essentially exactly the same as what magic get/set does, just for a single property. Derick Rethans 3:10 For example, when you then do set, or you can add additional validation to it. Nikita Popov 3:14 Exactly. Originally, you had a simple public property, then you can add a setter that checks okay this string cannot be empty. Derick Rethans 3:23 Okay, what it's the syntax that you're proposing? Nikita Popov 3:26 I went with these essentially the same syntax that's being used in C#. Looks like you write public foobar, and then you have this sort of semi colon you have a code block. And this code block contains two accessors, so then you have something like get, and another code block that specifies the get behaviour, and set, and the code block that specifies the set behaviour and so on. Derick Rethans 3:52 The RFC talks about implicit and explicit implementations of these getter and setter accessors. What is the difference between them and how does it look different in syntax? Nikita Popov 4:03 Yeah so the difference is, either you can write just get semi colon, set semicolon, that's an implicit implementation, or you actually specify a code block with real custom behaviour. To do the implicit implementation, you're saying that this is really a normal property, and PHP automatically manages the storage for you, is that you have this more fine grained control over how it works. Namely what you can do is you can say that you have get and private set. But that's a property that's publicly read only and internally writeable. You can write just get without set, in which case it's a real read only property both publicly and privately, or to be more precise, it's an init once property so you can assign to it once. Derick Rethans 4:52 How do you keep track of the init once? Nikita Popov 4:53 It's same mechanism as for Type Properties, where we distinguish between an initialized and an uninitialized property. You can assign to an uninitialized property, but you can't assign to an initialized one, if it's read only. The only maybe problem there is that this mechanism, requires that the property actually is uninitialized to start with, which means that for accessors you don't have any default values. To say there is no implicit default value, no implicit null value. If you want to have a default value the same as with type properties you have to specify it explicitly. Specifying a default value really only makes sense if the property is both readable and writable. For Read Only properties, if you specify the default then you will you can change that. Derick Rethans 5:37 You have basically have created a constant. Nikita Popov 5:39 Yes, it is essentially a constant. Derick Rethans 5:41 You mentioned already, PHP seven four introduced type properties. How do these types interact with the setter and getter accessors? Nikita Popov 5:50 I would say in the obvious way. The getter is required to return type of property, modulo the usual weak typing conversions, and the setter also checks before it's called whether the passed value matches the type or not. But enforces that matches the type. Derick Rethans 6:08 This does mean that if you provide an explicit implementation for the set accessor, you also need to specify the parameter name? Nikita Popov 6:15 No, or you can specify the parameter name, and if you don't then that's just passed in as the value variable. It's also inspired by how C# and Swift do it. I mean there are some possible variations here we could always require an explicit name, some people for that, or I also heard that some people would like to have the name of this implicit variable match the name of the property, instead of always being just value. Derick Rethans 6:41 Would you have to specify the type though? Nikita Popov 6:43 You wouldn't have to and you're actually not allowed to. So the accessor implementation is somewhat strict about not allowing you to do anything that would be redundant because otherwise, you know, there are quite a lot of extra things you could be adding everywhere. Derick Rethans 6:56 That's the same way as marking a property as private. And then the accessors as private as well. Right? Nikita Popov 7:03 Yeah exactly. So, then that will also say: if the property is already private you can't, again say that the accessors also private. Derick Rethans 7:11 I think that's the wise thing, otherwise people go overboard with adding private and final and whatever everywhere anyway right. Nikita Popov 7:18 One could argue that it's really not our business and this is a coding style question, but you know it's better to not leave people, with the option of doing stupid things. Derick Rethans 7:28 I saw in the RFC that it is also possible to use references with the get accessor. Does this complicated implementation and the idea of this RFC, a lot, or just a little? Nikita Popov 7:39 I think the important context to keep in mind here is that we already have magic get set, and the accessors are, like, largely based on their semantics. Magic getters already have this distinction between returning by value and returning by reference. The by reference return value is primarily useful for two cases. One and this is really the important one, is if you're working with arrays, any write operation on an array like setting an element or appending an element, those require that the getter returns by reference, because PHP will actually do the modification on the reference. Because some people asked about that. Why can't we just like get the array using the getter, then make the change and then assign back using the setter. That would theoretically work, but it would be extremely inefficient, and the reason is that this breaks PHP's copy on write mechanism. If the error is returned from the getter, then we have one array inside the property. And we have one copy of the array inside the property, and as the return value. Then we change the return value and the resource is now shared, we actually have to copy the whole array, and then we assign it back. So effectively what we do is we copy the array, we do single element change, and then we copy the array, we do a single element change and then we destroy the old array. That works in theory, but it's so inefficient that we would not want to promote this kind of usage. Derick Rethans 8:42 The way around is of course, is having an implicit methods on the class to make this change to the array itself right? Nikita Popov 9:10 That would be another option. Problem is that you will need a lot of methods, I mean it's not just a matter of setting a single element or unsetting an element, but you can also set like a deep element where you're not modifying the outermost array but, like, a multi dimensional array. You would actually have to pass through that information somehow as well. I don't think there is a simple solution to that problem beyond the reference based solution that we currently use. Derick Rethans 9:34 I saw people arguing about not bothering with references in this new implementation at all, but I think you've now made a good case for keeping them. Nikita Popov 9:42 Effectively not bothering with references just means not supporting that array use case. Which might be, maybe a reasonable limitation, especially if we like make a distinction and supported for the implicit accessor case where we can, you know, do internal magic to support that and not support it in the explicit accessors case. I mean, people were arguing that this reduces the complexity of the proposal, but it kind of also increases the complexity because now we are doing something else for the accessors and we're doing for the magic get/set, where we already have this established mechanism. I'm not really convinced by that. Derick Rethans 10:20 And I also think it creates inconsistencies in the language itself because it does something different with an implicit or explicit accessor, as well as it being different between the original underscore underscore get magic method as well. Nikita Popov 10:34 It's not a secret that I'm not a big fan of references, and I would certainly love to get rid of them, but it's a hard problem, and this array modification behaviour for magic get or for get accessors is certainly a large part of that problem, and I just don't have a good solution for it. Derick Rethans 10:52 I don't either. The RFC also goes into great detail about inheritance and variance. Would you have a few words on that? Nikita Popov 11:00 I think mostly inheritance works like inheritance does for methods, at least that's how it's supposed to work. Of course there are some interactions, because you can for example mix real properties and accessor properties. In which case, if you have parent accessor property, you can always replace it with a normal simple property, because normal properties they support all operations that accessor properties do. What you can't do is the other way around. If you have a parent normal property, then you can't replace that with an accessor property. And reason is that it does have some limitations. Not a lot, but there are some limitations. One of them is related to references, I mean, we're already talking about this topic. What the by reference get allows is taking a reference to the property, so you can do something like a reference equals the property. What you can't do is the other way around the property reference equals something else. So you can't assign a new reference into the property, that just doesn't work on a pretty fundamental level, because it would require an additional set handler for set by reference. As we don't particularly love references, adding a new mechanism to support that is not a very popular choice. Derick Rethans 12:20 Variance wise, I guess, the same rules apply as for normal properties and property types? Nikita Popov 12:27 Approximately. Properties are apparently invariant, so you can't change the type or I mean you can change it but it has to be an equivalent type. If you have a read only property, with only a getter, then the implementation makes the type covariant, which means you can use a smaller type in the child class. This is similar to how if you have a getter method, you could also give it a smaller type in the child class. The converse case, if you have a property that can only be set, then the type is contravariant, you can have larger type in the child class, though I should say that properties that can only be set are somewhat odd and really only supported for the sake of completeness, so maybe it might be worthwhile to drop the type specific behaviour there, because a set only property should already be really rare, and then set property with a contravariant inheritance that's like a edge case of an edge case. Derick Rethans 13:24 Would it even make sense to support set only properties? Nikita Popov 13:27 Not sure. So for the C#, implementation, I think they don't support this and there is a StackOverflow question about that, and people try to convince their, that they should support this, that the are really use cases. Currently the imagined use cases are along the lines of injecting values into a class, so using setter injection, just that now it's property based setter injection. Okay, I'll be honest I think it doesn't make sense. Derick Rethans 13:55 To be fair, I don't think either. It would reduce the length of the RFC a little bit. Nikita Popov 14:00 A little bit, yes. Derick Rethans 14:01 Can you say a few words about abstracts, traits, private accessors shadowing and things like that. So a lot of complicated words, maybe you, you can distil that into something slightly simpler. Nikita Popov 14:12 Well I think actually abstract properties are worth mentioning. In particular, the fact that you can now specify properties inside interfaces. If you have public properties, then it makes sense to have them really on the same level as public methods, so they are part of the API contract, and as such should also be supported in interfaces. Typically what the RFC allows is, you can't specify a simple property in the interface, but you can specify an accessor property, which tells you which operations have to be supported. So you can't have a property declaration that says, it just has a get accessor, or it has get and set. The implementation of course can always implement more, so if the interface requires get, then you can implement both get and set, but it has to implement at least get, either through an accessor offer another property. I think in most cases implementation will just be a normal property. Derick Rethans 15:03 Because a normal property would implement an implicit get already anyway? Nikita Popov 15:07 Yeah. Derick Rethans 15:08 How do property accessors tie in, or integrate with constructor property promotion? Nikita Popov 15:13 They are supported and promotion with the limitation that it's only implicit accessors. If you use constructor promotion, then you can specify your read only property in there, or property that is Public Read/ private write. You cannot specify a property with complex behaviour in there. This is mainly because it would mean that you embed large code blocks into the constructor signature, which is I think, pushing the limits of shorthand syntax, a bit. Like there is nothing fundamental that will prevent it, it's more a question of style. Derick Rethans 15:50 The RFC talks a little bit about how, or rather what happens if you use foreach, var_dump, or an array cast on properties with explicit accessor. What are the restrictions here? Is something chasing from normal standard properties like we currently have. Nikita Popov 16:03 I don't think so. So here is once again the case where we have this distinction between the implicit accessors, which are really just normal properties with limitations. So those show up in var_dump and array cast, foreach, as usual. And we have explicit accessors, which are really virtual properties, so they don't have any storage themselves. Any storage to use, you have to manage separately somehow. So, these don't show up in var_dump, foreach, and so on. Both these actual computed properties, they don't show up because that would require us to actually call all the accessors if you do foreach and that seems rather dubious to me. Derick Rethans 16:44 How this will work for internal API's that some extensions use to access, like a list of all the properties, for say, for a debugger. Nikita Popov 16:51 It'll work the same way as var_dump. I mean, in the end it's all, well it's not quite based on the same API's, but still, the answer is the same. You only get those properties that have some kind of backing storage, and those are only the ones that are either normal properties, or the ones with implicit accessors. Derick Rethans 17:09 That means I need to go find out a way how to be able to read the ones with explicit accessors. Nikita Popov 17:14 Yeah, if you want to. I don't think that the debugger should read those by default, because that means that doing a dump, will have side effects, which is not ideal, but maybe you want to have an option to show them. Derick Rethans 17:26 That's something for me to think about, because I'm pretty sure people are going to want to see the contents of these properties, even in a debugger, even though that could mean that are side effects, which I'm not keen on. Nikita Popov 17:36 I guess that's one of the, I would say advantages of using this over just magic get/set, because actually know which properties you're supposed to look at, with for magic get/set you just don't know at all. Derick Rethans 17:51 The RFC talks a little bit about the performance impacts and although I saw the numbers I didn't actually read them, when preparing for this recording. What are the performance impacts for implicit accessors as well as explicit ones? Nikita Popov 18:02 Impact is basically if you use implicit accessors that has similar performance to plain properties, performance is a bit worse. The reason is essentially that we have some limitations on caching. So we can't just cache it as if it were a normal property, because it could have asymmetric visibility. And we reuse the same cache slots for reads and writes. I've been thinking about maybe splitting that up but at least for now there is a small additional performance impact of using implicit accessors, but it's not really significant. On the other side if you use explicit accessors. Those are expensive, they are not quite as expensive as using magic get/set, but they are more expensive than using normal method calls. Reason is basically their normal method calls, they are very optimized, and they do not have to re enter the virtual machine, so we just stay in the same virtual machine loop, and we just switch to different stack frames. For magic get/set we actually have to like recursively call the virtual machine, because we don't have a good point to re enter it, at least based on our current API's. And we also have to deal with some additional stuff, particularly the fact that magic get/set and property accessor as both, they have recursion guards. Normally if you recurse methods in PHP, we don't do any checks about that. Xdebug does, but PHP itself doesn't, so you can infinitely recurse and PHP is fine. The only thing that happens is that at some point you'll run out of memory. Derick Rethans 19:37 Or when extensions are loaded such as Xdebug, you'll actually still get a stack overflow. Nikita Popov 19:41 So that's something we should still be addressing, at least the baseline behaviour that you can get to that memory limit error. For properties will set have recursion guards, which say that if you recursively access a property in magic get/set, that it will not call magic get/set again and instead, access the property as if they didn't exist. Derick Rethans 20:01 Instead of throwing in an error? Nikita Popov 20:03 Yeah. For property accessors I'm actually throwing an error on recursion, and the reason for that is if we didn't throw an error, then this would end up accessing dynamic property of the same name as the accessor, which would technically work, but it's very likely not what the programmer actually intended. So it's going to be really inefficient because you actually have to allocate space for the dynamic properties and access for those. So if you wanted to have some kind of backing storage for the property, then you should just explicitly declare it and access that, rather than accessing something with the same name and implicitly creating a dynamic property. Derick Rethans 20:41 Yeah, that sounds all very complicated. Nikita Popov 20:44 It's cleaner to just make it an error and let the programmer fix it, instead of PHP try to fix it for you. Derick Rethans 20:51 Are there any BC considerations about the introduction of property accessors? Nikita Popov 20:55 Not strictly, but I'm sure that it's going to break, various assumptions for people, or at least in the sense that, right now, most assumptions should already be broken through magic get/set. I mean you can always have this kind of magic behaviour. If we have accessors this is probably going to be a lot more common, and people will have to deal with things like properties being publicly readable, but privately writable much as because someone very rarely manually implements that behaviour, but because the language though has native support for it and it's going to be common. Derick Rethans 21:28 We spoke a little bit about all the different sticking points in his RFC, for example with references, but there's one other thing and I think it's an argument you make somewhere on the bottom of the RFC, that there is a separation between implicit and explicit property accessors. I'm wondering whether it would make sense to consider adding whether the implicit part of this RFC first and then maybe later look at adding explicit property accessors. Nikita Popov 21:54 That's really the main sticking point, and also my own problem with the RFC. I mean, you mentioned at the very start, that this is a very long RFC and still a little bit incomplete so it's going to be longer. It's a fairly complex feature that has complex interactions with other features in PHP. The implementation is actually, maybe less complex, then you think, given the RFC length. The main concern I have is that, at least for me personally the most useful part of the RFC, are the read only properties. The read only properties and the like Public Read, Private Write properties. I think these two cover like 90% of the use cases, especially because if you have a property that is only publicly readable, then you don't really have to be concerned about this case where you have to, later on, add additional validation. I mean after all the property is read only, or you control all the sets because they're private. There is no danger of introducing an API break, because you have to add additional validation. I think like the largest part of the use case of the whole accessors proposal will be covered by these two things, Or maybe even just one of these two things, that's a bit of a philosophical question. There are some people who think we should have just public read / private write and no proper read only properties, because that like looks the same from the user perspective, but still gives you more flexibility. I think that's like the most important use case, and we could implement that part with a lot less language complexity. So the question is really does it make sense to have this full accessor proposal, if we could get the most useful part as a separate simpler feature, and, well, I heard differing opinions on that one. I was actually pretty surprised that their reception of the on like a full accessors proposal was fairly positive. I kind of expected more pushback, especially as, this is the second proposal on the topic, we had earlier one with, like, similar syntax even though different details, and that one did fail. Derick Rethans 24:02 How long ago was that? Nikita Popov 24:03 Oh that was quite a while actually, at least more than five years. Derick Rethans 24:06 I think that the mindset of developers has changed in the last five to 10 years, like introducing this 10 years ago would never happened, or even typed properties, right. It would never have happened. Nikita Popov 24:17 That's true. Derick Rethans 24:19 Do you have any idea when you're going to put us up for a vote? Because, of course, PHP 8.1 feature freezes coming up in not too far away from now. Nikita Popov 24:28 Yeah, I'm not sure about that. I'm still considering if I want to explore the simpler alternatives, first. There was already a proposal. Another rejected proposal for Read Only properties, probably was called write once properties at the time. But yeah, I kind of do think that it might make sense to try something like that, again, before going to the full accessors proposal or instead. Derick Rethans 24:54 Do you have anything else to add? Nikita Popov 24:56 What are your thoughts on this proposal, and the question at the end? Derick Rethans 24:59 I quite like it, but I also think that it might make sense to split it up. I'm always quite a fan of splitting things up in smaller bits, if that's possible too, and still provide quite a lot of use out of it. And that's why I was asking whether it makes sense to split it up into the implicit part and the explicit part of it, especially because it makes the implementation and the logic around it quite a bit easier for people to understand as well. Nikita Popov 25:24 It's maybe worth mentioning that Swift also has a similar accessor model but it is more like a composition of various different features like read only properties, properties with asymmetric visibility, and then finally properties with like fully controlled, get and set behaviour rather than this C# model where everything is modelled using accessors with appropriate modifiers. So there is certainly precedent in other languages of separating these things. Derick Rethans 25:55 Something to ponder about, and I'm sure we'll get to a conclusion at some point. Hopefully some of it before PHP eight one goes and feature freeze, of course. We've been chatting for quite a while now, I think we should call it the end for this RFC. Thank you for taking the time today to talk about property accessors. Nikita Popov 26:11 Thanks for having me, Derick. Derick Rethans 26:12 Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Property Accessors Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 85: Add IntlDatePatternGenerator
PHP Internals News: Episode 85: Add IntlDatePatternGenerator London, UK Thursday, May 20th 2021, 09:13 BST In this episode of "PHP Internals News" I discuss the Add IntlDatePatternGenerator RFC with Mel Dafert (GitHub). The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick, welcome to PHP internals news, the podcast, dedicated to explain the latest developments in the PHP language. This is episode 85. Today I'm talking with Mel Dafert about the "Add Intl Date Pattern Generator RFC" that she's proposing for inclusion into PHP 8.1. Mel would you please introduce yourself? Mel Dafert 0:35 Hello, I am Mel. I've been working professionally with PHP for about three years. Recently I started reading the internals mailing list in my free time, but this is my first time contributing. Derick Rethans 0:46 What made you think starting to read the PHP internals mailing list? Mel Dafert 0:50 I generally like reading mailing lists and issue trackers. And since I work with PHP, it was interesting to read what's, what's happening. Derick Rethans 1:02 That's what I'm trying to read this podcast as well of course; explaining what happens in the PHP development. But let's get to your RFC. What is the problem that you're trying to solve for this? Mel Dafert 1:14 Currently, PHP exposes the ability for locale dependent date formatting with the Intl Date Formatter class. It is basically only three options for the format: long, medium and short. These options are not flexible enough in some cases, however. For example, the most common German format is day dot numerical month, dot long version of the year. However, neither the medium nor the short version provide this, and they use either the long version of the month, or a short version of the year, neither of which were acceptable in my situation. Derick Rethans 1:47 I realize that you basically ran into a problem that PHP wasn't doing something you wanted to do it. But what made you actually wanting to contribute this? Mel Dafert 1:57 I ran into this exact problem at work where I wanted to format dates in this specific way. After some research, I found out that ICU, the library that powers Intl Date Formatter, exposes exactly this functionality already. It would be relatively easy to wire this up into PHP and expose it there as well. I also found in a bug report that other people had this problem as well, so I decided to try my best at hacking at the PHP source and make it available to everyone, using PHP. Derick Rethans 2:25 Had you ever seen a PHP source code before? Mel Dafert 2:28 I don't think so. No. Derick Rethans 2:29 But you are familiar with C a little bit? Mel Dafert 2:32 On a very basic level, yes. Derick Rethans 2:34 As part of this RFC What are you trying to suggest to add to PHP? Mel Dafert 2:39 ICU exposes a class called date time pattern generator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. Skeleton just includes which part are supposed to include it, to be included in the pattern, for example the numerical date, numerical month, and the long year, and this will generate exactly the pattern I wanted earlier. It is also a lot more flexible, for example the skeleton can also just consist of the month and the year, which was also not possible so far. I am proposing to add a Intl Date Pattern Generator class to PHP, which can be constructed for locale, and exposes the get best pattern method that generates a pattern from a skeleton for that locale. Derick Rethans 3:22 The skeletons, what do you specify in these skeletons? Mel Dafert 3:27 It's a similar format to the pattern itself. For example, it's lowercase y lowercase y uppercase M uppercase M, would give you only the year and only the month, if I'm correct, that's exactly what the skeleton looks like. Derick Rethans 3:43 But it puts it in the right order? Mel Dafert 3:45 It puts it in in the right order, and in some cases also adds extra characters, or even changes the format slightly, depending on the locale. Derick Rethans 3:55 So it is a bit of a flexible way to tell the Intl extension to format them in a slightly more, well how do you say this, a slightly more intelligent way than what the standard, long, short and medium constants do for you. Mel Dafert 4:11 Exactly. Derick Rethans 4:12 Why is it so important that you get these formats, right, or rather I should say, how do these locales influence formats and why is this important? Mel Dafert 4:21 There are conventions of how to format dates and times vary rather strongly between languages and country. In Austria, for example, nobody would expect to understand the US format of month slash day last year. I assume people in England may have the same issue. Derick Rethans 4:38 I think everybody has that issue except for people in the US. Mel Dafert 4:42 But that only shows the importance of using a format that people are used to and understand. Other languages like mainland Chinese even have the words for day and month included in the format, as far as I understand. I don't speak Chinese. Derick Rethans 4:59 Neither do I, but a long time ago when I, when I added the date time support, not Intl, but PHP standard date time support, I also looked at locales that operating systems have. And even these locales, which is not something that Intl uses now, also encode these extra characters at least for Japanese, so that was interesting to see there as well. Mel Dafert 5:22 There is a lot of sometimes somewhat unexpected formats. Derick Rethans 5:27 And I think German sometimes once the add the in front, and sometimes behind and things like that. I know there's lots of little intricacies, yes. I see that he RFC makes an argument about which name to pick for the new class. Can you elaborate on the two different options that are? Mel Dafert 5:44 Yes, this is certainly for us and what I would call bike shedding. ICU has something of an inconsistency in its naming. The formatting class is called date formatter. And the pattern generator class is called Date Time pattern generator. Derick Rethans 6:00 So it has the extra word time in it? Mel Dafert 6:03 Between some inconsistency with Intl Date Formatter, which already exists in PHP, and the Intl Date Time pattern generator, or if we make sure PHP is internally consistent and omit the time in all cases. So far consensus seems to lean towards the second option. This is also what the Hack people decided to use. Derick Rethans 6:24 And I believe that's the one you are wanting to go with in this RFCs as well, right? Mel Dafert 6:28 Exactly. So far, everybody voted slide, or like express themselves to slightly favour the version without time. So that's the one I'm going with. Derick Rethans 6:40 Of course, as you mentioned, this is a fairly small change to it, but the RFC talks a bit about things to add in the future, because I believe you weren't suggesting to add all of these Intl functionality straightaway. What is this future scope? Mel Dafert 6:55 ICU would also expose more methods around the skeletons, for example, turning a pattern back into its skeleton, or building a list of skeleton and then mapping to the patterns from scratch. That's what you would do in theory if you added your own special locale to this. Derick Rethans 7:17 I'm not sure how to do that with PHP actually, but I think ICU allows you to build your own basically files with settings right? Mel Dafert 7:25 Exactly. This is omitted all of this, for simplicity, and because they couldn't think of a use case for it, personally, at least. If someone does need them, they could easily be added. It would just be a bunch of extra methods on the, on the class. Derick Rethans 7:43 I know that ICU has so much functionality that hasn't been exposed to PHP, because there's just so much of it right? Mel Dafert 7:50 Extremely, yes. I did see that Hack decided to expose all of them, like all the methods that the class has, but I really don't see the use of having to document and test all of these methods when really only one is going to be used. So I've decided to just go for the one that I can actually see people using. Derick Rethans 8:14 And it is always easy to get smaller parts added to PHP than big things, to begin with. Mel Dafert 8:21 Exactly. Derick Rethans 8:22 How has the reception been so far? Mel Dafert 8:24 I haven't gotten feedback from too many people, but it seems positive so far. A few people that did give some feedback were constructive and seem to seem to like the idea of adding this. Derick Rethans 8:36 I reckon outside of English speaking countries this is quite an important thing to actually support, especially as we just discussed, people are picky about how these things are formatted. Mel Dafert 8:46 Very picky. Derick Rethans 8:48 So the name that you're going for would be Intl Date Pattern Generator, would it also support patterns for the time itself? Mel Dafert 8:55 Of course, just like Intl Date Format also support formatting time. Derick Rethans 9:02 It would be strange if it didn't, to be honest. Mel Dafert 9:04 Yeah. Derick Rethans 9:05 When do you think you're going to put us up for a vote for inclusion to PHP 8.1? Mel Dafert 9:10 I think I sent out the first email about two weeks ago for opening the discussion. So I was planning to send out the heads up, either today or tomorrow, and opening the vote after that. Derick Rethans 9:23 Okay. To be fair, I think there is very little controversy in this one, so it would surprise me if it didn't pass. Mel Dafert 9:30 That's reassuring. I am somewhat anxious about them. Derick Rethans 9:33 It's not controversial, it is an, it is perhaps a niche thing but it is something that is useful, so I can't see people really be opposing to this. To be fair, I think it looks like just an omission from when the Intl extension was written in the first place. Mel Dafert 9:48 That's true. It might have not been supported in ICU at that point. Derick Rethans 9:54 That is a good point as well because I think the Intl extension came with PHP five three, or five four, which I think is now eight years ago or something like that. Mel Dafert 10:04 I think, I think ICU might have not had it at the end. It's an old word, like it's an all supported versions of PHP. Derick Rethans 10:13 That is good to know. Would you have anything else to add? Mel Dafert 10:16 No, I think that's it. Derick Rethans 10:17 Thank you for taking the time today to talk to me about your proposal to add the Intl date pattern generator to PHP 8.1 Mel Dafert 10:25 Of course. Thank you for having me. Derick Rethans 10:29 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Add IntlDatePatternGenerator Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers
PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers London, UK Thursday, May 13th 2021, 09:12 BST In this episode of "PHP Internals News" I converse with Ben Ramsey (Website, Twitter, GitHub) and Patrick Allaert (GitHub, Twitter, StackOverflow, LinkedIn) about their new role as PHP 8.1 Release Managers, together with Joe Watkins. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick, welcome to PHP internals news, a podcast, dedicated to explaining the latest developments in the PHP language. This is episode 84. Today I'm talking with the recently elected PHP 8.1 RMs, Ben Ramsey and Patrick Allaert. Ben, would you please introduce yourself. Ben Ramsey 0:34 Thanks Derick for having me on the show. Hi everyone, as Derick said I'm Ben Ramsey, you might know me from the Ramsey UUID composer package. I've been programming in PHP for about 20 years, and active in the PHP community for almost as long. I started out blogging, then writing for magazines and books, then speaking at conferences, and then contributing to open source projects. I've also organized a couple of PHP user groups over the years, and I've contributed to PHP source and Docs and a few small ways over the years, but my first contributions to the project were actually to the PHP GTK project. Derick Rethans 1:14 Oh, that's a blast from the past. You know what, I actually still run daily a PHP GTK application. Ben Ramsey 1:21 Oh, that's interesting. What does it do? Derick Rethans 1:23 It's Twitter client. Ben Ramsey 1:24 Did you write it. Derick Rethans 1:26 I did write it. Basically I use it to have a local copy of all my tweets and everything that I've received as well, which can be really handy sometimes to figuring out, because I can easily search over it with SQL it's kind of handy to do. Ben Ramsey 1:41 It's really cool. Derick Rethans 1:42 Yep, it's, it's still runs PHP 5.2 maybe, I don't know, five three because it's haven't really been updated since then. Ben Ramsey 1:49 Every now and then there will be some effort to try to revive it and get it updated for PHP seven and eight, but I don't know where that goes. Derick Rethans 1:59 I don't know where that's gone either. In this case, for PHP eight home there are three RM, there's Joe Watkins who has done it before, Ben, you've just introduced yourself, but we also have Patrick Allaert, Patrick, could you also please introduce yourself. Patrick Allaert 2:13 Hi Derick, thank you for the invitation for the podcast, my name is Patrick Allaert. I am a Belgian freelancer, living in Brussels, and I spent half of my professional time as a IT architect and/or a PHP developer, and the other half, I am maintaining the PHP extension of Blackfire, a performance monitoring solution, initiated by Fabien Potencier. Derick Rethans 2:39 I didn't actually know you were working on that. Patrick Allaert 2:40 I'm not talking much about it but more and more. So I succeeded to Julian Pauli, who by the way was also released manager before so now I'm working with Blackfire people. It's really great, and this gives me the opportunity to spend about the same amount of time developing in C and in PHP. This is really great because at least I don't. It's not just only doing C. I, at least I connect with what you can do with PHP. I see the evolution from both sides. And this is really great. It's great, it's also thanks to you Derick, you granted me access to PHP source codes. That was to contribute to testfest something like 12/13 years ago, it was, CVS, at that time. Derick Rethans 3:28 CVS, so now I remember that. Basically, what you both of you're doing is making me feel really old and I'm not sure what I like that or not. I think we all have gotten less head on our heads and greyer in our beards. In any case, what made you volunteer for being the PHP 8.1 RM? Patrick Allaert 3:46 In my case, I think there were two two reasons is that PHP really brings a lot to me in my career, everything is built around my expertise in PHP and its ecosystem. By volunteering as a release manager. I think I can give something back to PHP, because the last time I contributed to source code of PHP, it was really years ago. If I remember it was array to string conversion that was very silent and not emitting any notice; now it's warning. In the meantime, so I think that was PHP 5.0, Derick Rethans 4:22 Ages ago. Patrick Allaert 4:23 Ages ago. Indeed. I was quite passive I was mostly reading on PHP internals, and most of the time now that is quite big so if, if I had to say something I could always see some someone who already just said the same thing so I was not saying: plus one. This is one of the reason and the second one I think is that I think it's kind of a unique opportunity, and I can learn a couple of things. I think, on day one when the Rasmus gave me the access, saying that I can do to OAuth authentication on SSH and that was: okay, day one I already learned something, so that was really cool. Derick Rethans 4:58 And you Ben, I think you tried to be the PHP eight zero release manager as well at some point. That didn't happen at the time, but you've tried again. Ben Ramsey 5:06 I almost didn't try again. I don't know why but when Sara announced it this year, I thought about it, and I don't know, I tossed it around a little bit, but I've been wanting to do it for a long time and I've noticed as Joe Watkins recently put it on a blog post that we need to help the internals avoid buses. So since this is a programming language that I've spent a lot of time with just as Patrick mentioned, both in and out of my day jobs. I want it to stick around to thrive. Since I'm not a C guru, but I do have a lot of experience managing open source software. I wanted to volunteer as a release manager, and I hope that I can use this as an opportunity to inspire others who might want to get involved, but don't know how. Derick Rethans 5:55 And of course you just mentioned Joe, Joe Watkins, who is the third PHP release manager for 8.1, and that is a bit of a new thing because in the past, when the past many releases I can remember you've only had two most of the time. Ben Ramsey 6:09 I think, on the mailing list that came up early on in the thread, and there was a general consensus, I think, consensus may be the wrong word, but there were a couple of people who spoke up and said that they wouldn't mind seeing multiple rookies or mentees or whatever you want to call us, and Joe when he volunteered to be the veteran, and he was the only one who volunteered as the veteran. He said that he would take on two. And so that's that's why Patrick and I are both here and I think that's a good idea, because it will continue to help, you know, us to avoid buses. Derick Rethans 6:46 Yep. And if you're three, you only have once every 12 weeks. Whereas of course, in my case doing it for PHP 7.4 it's every four weeks, because it's me on my own, isn't it. Which is unfortunate that these things happen because people get busy in life sometimes. Getting started being a PHP release manager can be a bit tricky sometimes because just before we started recording, I had to add you to a few mailing lists. Do you think you've now have access to everything, or what do you need access to to begin with? Patrick Allaert 7:18 There is the documentation about release managers, what are you supposed to do, and, and there is an effort of documentation, what you have to ask, in terms of access, and that's great. We are probably going to contribute with our findings to, to improve the documentation. Once you did a bit of the setup, mainly needs to access the servers. You should also know what is the workflow and what are the usual tasks. This is mentioned in the documentation, but I think it would be better to have a live discussion with someone that already did it. The fact that we are doing it with Joe Watkins, who is not only a release manager of 8.1, but also previous release manager, that should be really smooth, to, to see what the the orders and what is the routine to do. To do so, why do you think Ben? Ben Ramsey 8:16 I agree. I think that, I mean we've only just gotten started. It's only this I believe is what was it two weeks ago that we, that this was announced. So this is the first time that Patrick and I have actually spoken face to face. Hi, Patrick! We've communicated by email and slack. I'm sorry not Slack, StackOverflow chat. Joe has given us a lot of good pointers. I feel like some of the advice he's given his been really good, but it's like Patrick said, we haven't really had like a live, like one on one chat, or face to face chat, where we could kind of get caught up on things and understand what the flow looks like. So last week I started going through a lot of the pull requests on GitHub. And I've been tagging them as bug fixes or are enhancements, and there's also an 8.1 milestone that I've been adding to a lot of the tickets, are the pull requests, and I've merged a few of them, but I think that I've merged them a little prematurely. So there were some funny things that came up out of that. I do plan to blog on this, but one of Nikita's comments in the Stack Overflow chat was, you've just made it your personal responsibility to add tests for uncovered parts of the Ristretto255 API. Derick Rethans 9:40 Right, exactly say because I'm doing release management for PHP seven four. I don't do any merging at all. The only thing I'm doing is making the packages, and then coordinating around them. I'm not even sure whether it is a responsibility of a release manager to do. Ben Ramsey 9:55 It may not be a responsibility. I felt like it was helpful maybe to go ahead and take a look and see where things were trying to follow up with people, to get them to respond if something had been sitting there for two weeks or so without any kind of movement. I would, you know, leave a message saying what's the status of this. Derick Rethans 10:19 I know from the documentation that we have on our Release Management process. And many of these steps actually been replaced by a Docker container that actually builds the binaries, so I'm not sure whether Joe I've mentioned that to you yet, because I'm not sure whether that was around when he did release management, the previous time. Ben Ramsey 10:36 Right, it wasn't around either when he did release management, but he's also mentioned that he would like for us to learn how to do it without the Docker container, even if we do plan to use the Docker container. Derick Rethans 10:48 That's fair enough, I suppose. I have never had to do that, but that there you go. Now, what is the timeline like? Patrick Allaert 10:56 In terms of timeline I think the very first thing is being all three release managers having live discussion to define what, what we should do, when we should do, and how. This way we clearly knows our responsibility and the sequence, and also how we are going to organize. Do we do every three releases? We share the task? How are we going to do the work together. In terms of timeline I think the very first release is going to happen in June, if I remember correctly. I set up an agenda sheet with ICAL so that we all can put that in our calendar, nothing really clear on my side. Derick Rethans 11:41 From what I can see from the to do list that the first alpha release is June, 10, which is exactly a month away from when we are recording this. Patrick Allaert 11:51 Right, yeah, it's one month come down before the very first one. I think it might be great that the very first release being made by by Joe, so that we can really see every single step he's doing, so that we can do the same. However, I guess it's kind of a shared responsibility to do triage of bugs and pull requests. Ben Ramsey 12:14 Right. I think there is some desire among the community to see these releases in real time at least a few of them. So I'm going to try to encourage us to stream some of them maybe live, or at least record it and put it up somewhere for people to kind of just see the process to demystify it, so to speak. Derick Rethans 12:35 I actually tried it a few months ago to record it, but there were so many breaks and pauses and me messing things up, and me swearing at it, that I had to throw away the recording. I mean the release went out just fine but like absolute as again... I can imagine the first few times, you're trying this there might be some swearing involved, even though you might not vocalize that swearing. Ben Ramsey 12:56 Oh I'll vocalize it. Derick Rethans 12:58 Fair enough. This is something that is that you're going to have to do for the next three and a half years. Do you think you'll be able to have the time for it in another three years? Ben Ramsey 13:08 I mean for myself I I'm committed to it, I definitely believe that I'll have the time over the next three and a half years, and I'll make the time for it. Derick Rethans 13:18 What about you, Patrick? Patrick Allaert 13:20 Exactly the same. I think it's the least that I can do to PHP, in terms of contributing back, there will be some changes because I it's not like it's, it's not like the infrastructure is something that doesn't change, like for example recently, GitHub, being more having more focus rather than our Git infrastructure. So the changes that will happen, we will have to adapt, I have the impression that release manager has to, every time it's adapting to change this, and that will be very interesting. Derick Rethans 13:53 Luckily we haven't had too many. The only thing I had to change with a change from git dot php.net to get up, was my local remote URLs. So there wasn't actually a lot to do, except for running git remote set-url. I was pleasantly surprised by this because if anything messing around with Git isn't my favourite thing to do. Ben Ramsey 14:14 Also, merging is a little bit more streamlined now you don't have to go to qa.php.net to do that. Derick Rethans 14:21 I've never done anything without Ben Ramsey 14:23 Really? Oh, I guess you would commit directly to git.php.net? Derick Rethans 14:27 Yep. Ben Ramsey 14:28 If there were PRs on GitHub, the only way to merge them well, probably wasn't the only way but one way to merge them was to go to qa.php.net, and if you were signed in with your PHP account, you were able to see all the pull requests, and choose to merge them. Derick Rethans 14:46 Yep, also something I've never done as an RM. The only way how I have reacted with pull requests is commenting on the pull requests, and I wouldn't merge them myself. With the only exception of security releases where you need to cherry pick from certain branches into your release branches. I'm not always quite sure about it as the responsibility for release managers actually do the merging into the main branches. From what I've understood is it's always the people that made the contributions, who just merge themselves, and you then sometimes need to make sure that they merge into the right branch instead of just master, which is what, as far as I know, the, the buttons on GitHub do. Ben Ramsey 15:21 Well the individual contributors, in this case, if they're doing like a bug fix or something, most of them, or many of them aren't don't have permission to do the merging, so someone else has to merge it, like, often I see Nikita merging, a lot of the pull requests. Derick Rethans 15:37 Maybe I've just been relying on Nikita to do that then. I'm not sure how, bug fixes are merchants debug fig branches. I think it's usually been done by people that have access already anyway, because it's often either Nikita or Cristoph Becker, or Stas, and the main developments, or the main other new things that people don't have access to are usually to master. So I guess there's a bit of a difference now. I'm not sure what if any other questions, actually, would have anything to add yourself? Patrick Allaert 16:05 maybe something that would be quite challenging is the very recent discussion about the system that we, that we might change from. The system or the issue tracker with where we have all the bugs. I understand the current issues, I understand as well the drawbacks of what is possibly, for example GitHub issues. It might be great for some, would it be great for us? If we do it was going to be in the bring a lot of changes, and I think, 8.1 will be already slightly impacted by the change to GitHub in terms of pull request strategies, but potentially there will be another change, which is around the bugtracking system. Derick Rethans 16:54 I have strong opinions about this, but we'll leave that for some other time. What about you, Ben? Ben Ramsey 17:00 Right, I actually don't think that we're going to end up making a lot of changes in that regard, very, like, not in the near term, probably. But I did want to point out, or promote that I've started journaling some of these experiences, and capturing information mainly for my own purposes, but I'll be posting these publicly so that others can follow along. My blog is currently down right now. Derick Rethans 17:28 That's because you're using Ruby isn't it? Ben Ramsey 17:30 That's because I'm using Ruby. The short story of it is that there are some gems that were removed from the master gem repository at some point in the past, or the versions I'm using were removed, either for security reasons or what I have no idea why. And that's put, put it into a state where I just can't easily update. I just haven't, I just don't care, right now, so I plan on migrating to something else. In the short term, I'm not going to be doing that. So I've started writing at https://dev.to/ramsay and Dev.to is just a developer community website. If you're on Twitter. It's run by @thePracticalDev, I'll be, I'll be blogging there. Derick Rethans 18:18 And I'll make sure to add a link to that in the show notes as well. Thank you for taking the time this afternoon, or morning, to talk to me about being a PHP 8.1 release managers. Ben Ramsey 18:28 Thank you for having me on the show. Patrick Allaert 18:30 Thank you, Derick for that podcast. I'm really glad you invited us. Derick Rethans 18:39 Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes PHP 8.1 Release Todo List Ben's journal Joe Watkins' Avoiding busses blog post Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 83: Deprecate implicit non-integer-compatible float to int conversions
PHP Internals News: Episode 83: Deprecate implicit non-integer-compatible float to int conversions London, UK Thursday, April 29th 2021, 09:11 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Deprecate implicit non-integer-compatible float to int conversions" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 83. Today I'm talking with George Peter Banyard, about another tidying up RFC, George, would you please introduce yourself? George Peter Banyard 0:31 Hello, my name is George and I work on PHP in my free time. Derick Rethans 0:35 Excellent. I was just talking to Larry Garfield, and he was wondering whether you or himself, are the second often guests on this podcast, but I haven't run a stats. But it's good to have you on again. Following on for from other numeric RFCs, so to speak. This one is titled deprecate implicit non integer compatible floats to int conversions. That is a lovely small title you have come up with. George Peter Banyard 1:01 Yeah, not the best title. Derick Rethans 1:03 What is the problem that this RFC is trying to solve, or rather, what's the change that is in this RFC is trying to solve? George Peter Banyard 1:11 Currently in PHP, which is a dynamic language, types are not known at the statically at compile time, so it's so everything's mostly runtime. And most type conversions are relatively sane now in PHP 8, because like numeric strings have been kind of fixed. But one last standing issue is that floats will pass an integer check, without any notices or warnings. Although floats, don't usually fit in integer will have like extra data which can't be represented as an integer. For example, they can have a fractional part, or they can be infinity, or not a number if you divide, like infinity by infinity, or 0 over 0 or other things like that. Derick Rethans 1:55 These are specific features of floating point numbers on computers? George Peter Banyard 1:59 Yes. Derick Rethans 2:00 Is there any prior work that is RFC is building on top of George Peter Banyard 2:03 It builds up on top on the saner numeric string RFC, because it tries to like make the whole numericness of PHP, as a concept better and like less error prone, but in essence it's mostly self contained. If you use a floating point number, were you should be using an integer. If the floating point number, is considered an integer because it only has like decimal zeros, and it fits in the integer range, then you'll have like no error. So if you use 15.0 as an array key, it gets converted to 15, you'll get don't get any error because it's like well it's just 15 like it doesn't mind. But if you do 15.5, then you'll get like a, like a deprecation notice which will tell you it's like, well, here's the key gets implicitly converted, you should be aware of this because if you use 15 somewhere else, you'd be overriding the value. Derick Rethans 2:54 And that currently doesn't happen yet. And you say, fits in the integer range, what ranges are we talking about here? George Peter Banyard 3:01 On 32 bit, which I would imagine most people don't use any more, is a very just like minus 2 billion to 2 billion, because PHP integers are signed, and on 64 bits, it's like nine quintillion? Derick Rethans 3:15 It is a 64 bit range. George Peter Banyard 3:17 You should be fine by not hitting them, but like if you do some maths computation with it, you might hit like the boundaries, or you do like very edge cases where you try to like mess up with PHP, like that so it's something which you can do it. Derick Rethans 3:30 From what I understand floating point numbers they store, integer things as well as fractional things, and from what I remember is that the range that floating point numbers can store things in without losing any precision is something like 53 bits I think. So if it's larger than 53 bits, then it would have to store something in its floating point part of it, and hence starts losing numbers. George Peter Banyard 3:56 Yeah, I'm not the expert on floating point numbers. Derick Rethans 4:00 They are tricky. George Peter Banyard 4:01 They are tricky and the standard is very confusing at times, especially with like NaNs and like signalling NaNs and like, but the basic concept is like exponential numbers like exponential like scientific notation. You have like your base number, and then you have like a power, and then just gives you like a larger range, but with exponential scientific notation, you also lose like precision, because you don't really care about like the minute numbers. Derick Rethans 4:24 This is why there is a conversion issue both ways right, a floating point number, without fraction or NaN or INF, will always fit in the 64 bit integer. George Peter Banyard 4:32 Yeah. Derick Rethans 4:33 But although you can represent a 64 bit integer in a floating point number. You can't do that without losing data if the integer takes up more than 53 bits, so the round trip conversion also has issues. George Peter Banyard 4:44 That's why, like, that's the check. I'm using, because initially I was just doing an F mod check, because I was like oh just check for like fractional parts, and then Nikita was like: Well, you probably should do round trip checks. Because you also catch infinity, not a number, which also has like some interesting implications, that like if your floating string is considered infinity and you cast it to an integer, you get max int. If it's a floating point number, you'll get zero, which is an handy thing that needs to be like also dealt with, because I just discovered that while working on that. Trying to already get rid of like conversions, is I think a good first step on making most things sane. And we already do that with string offsets. So it's also just like making it more of a global aspect of the language. Derick Rethans 5:35 So this RFC only talks about converting floats to int, but not int to float? George Peter Banyard 5:40 Yeah, because mostly integer to float is a, is a safe conversion, because you can, it fits usually in a floating point number, except apparently 64 bits. Derick Rethans 5:52 I think it is something that we actually should also look at and this is not something I'd realized because I originally thought that before reading the RFC, this that is what you were trying to get, but it's they other way, the other the way around here; so I can see another upcoming RFC to do the other side of the conversion as well. George Peter Banyard 6:11 I imagine so. I put that on my to do list, which is already growing larger and larger with every small idea. I encountered in the language which I'm like, why on earth is PHP doing that? Derick Rethans 6:22 But to get back to this RFC in which kind of situations can this trip up developers? George Peter Banyard 6:27 I would expect most of the time it shouldn't, because this every time you use integers, floating points is mostly maths code, or, if you're doing something very weird, like storing money as a floating point number, which you shouldn't do, but people do it anyway. Derick Rethans 6:45 Does PHP have an arbitrary precision type? George Peter Banyard 6:48 No we don't. But you can use GMP. Derick Rethans 6:52 I don't know what it stands for either, what GMP stands for. They also used to be BCMath, is that's still around as well? George Peter Banyard 6:58 Yeah, BCMath is still around. Most of the time you don't need arbitrary precision, at least for traditional PHP code which is a web based and possibly like E commerce so you're not hitting like insane numbers, but it is mostly full of direct cases or also with like string conversions to like integers, that's I think like, my main point is to try make also string conversions to then numeric type like to make them safer. I think was the previous RFC was the saner numeric string, there's maybe an expectation that you can finally bypass the strict type mode, because everything is strings in HTTP land. So if you get a value, and you just wanted to let like the engine, take care of like making it, it's a valid number and it doesn't lose precision, and you get an integer. That makes it helpful that you get these warnings and notices that hopefully in PHP nine which is who knows how many years away, we finally can like lock down on these edge case behaviours. Derick Rethans 7:57 The RFC is not making this stop working, but rather it will throw a deprecation notice? George Peter Banyard 8:04 Yes. Currently, yes. Derick Rethans 8:06 Why do you say currently? George Peter Banyard 8:07 The plan is in PHP nine, to make this a type error, which initially, I wanted to make it a warning instead of a deprecation notice, but then people on list were, well, like a warning is too strong, and it doesn't imply anything. And if you want to change this to like a type error you should make it a deprecation notice because it means the behaviour will stop working in the, in the later version. So that's why I changed it to the deprecation notice in the second iteration of the RFC. Derick Rethans 8:34 Because, I mean, she just said that could potentially impact already existing code. What kind of BC issues are ever this, by introducing this deprecation warning? George Peter Banyard 8:43 There are various operators, that will implicitly convert floating or float strings to integers. So those are like bitwise operators, shift operators, the modulo operator, the assignment operators of the above. If you try to assign a float to an integer type property. If you try to pass a float to a parameter integer type, or as a return type. Those will show deprecation notices. And then only for floats, not float strings, is the bitwise NOT operator, because that one works with strings as well. And if you use a float string it will use the normal string semantics. And then, as an array key because floating strings already so I noticed was as an array key. Derick Rethans 9:28 Do you think it is better to have a deprecation notice than in stead PHP silently truncating data? George Peter Banyard 9:35 Yeah, if you want that behaviour of like implicitly truncating, you can always use an int, cast, which will do the job for you. Which makes the code explicit and tells the intention of the developer, instead of just like, oh I got a float here, pass it to an integer. Derick Rethans 9:49 What's the reaction to this been so far? George Peter Banyard 9:51 Not many reaction on list, but voters currently one weekend, and it's been unanimously approved, so I'm pretty happy that most people are for it. Derick Rethans 10:02 It's always good to hear unanimous agreement, maybe I should switch my vote to No. As you have said the reaction has been fairly good. And obviously this RFC passed, so the reaction was good enough for this to pass. Do you think there will be some follow up RFCs for ironing out more things like this? George Peter Banyard 10:20 Possibly, I don't know if I'll get them into PHP 8.1. Because time, and I've got some other projects. But I think, maybe, to see you, I've just learned that like some integers lose precision as in floating point numbers, which I wasn't aware of. What's maybe a bit more controversial is to change the behaviour of casting floats, which don't fit into an integer range to now produce Max int, or minimum Int, instead of zero. You will need to put like deprecation notices or warnings when you use an explicit cast, which I don't know how people will feel about that. Derick Rethans 10:58 I see what you mean there. It will be an interesting discussion for when that happens I would say. George Peter Banyard 11:04 Yep. Derick Rethans 11:05 Would you have anything else out about is RFC itself? George Peter Banyard 11:08 Not really it's mostly straightforward. All the details are in the RFC, all the BC breaks are in the RFC. If you're an extension maintainer, there's only one BC break with like a function. When you take Zval and you convert it to an integer, you'll get a notice, which I expect most extension maintainers want their users to know that this is going to like throw at the later point. But you can also then do it manually if you want to support this behaviour implicitly in your extension. Derick Rethans 11:36 I think it is important that extension, that for extensions be if it doesn't suddenly change, but forcing an API change on them is often a better way than deciding to changing an existing API, I think. George Peter Banyard 11:47 The problem is is the API I'm using is used all over the PHP source code, changing that everywhere, felt a bit like hassle, but I've added like a C function which is long compatible, so you can check in advance if it will also do stuff like that. And then there's also a version which, which doesn't serve any notices so you can do it anyway. Derick Rethans 12:08 And that is a new function I suppose? George Peter Banyard 12:10 Yes. Derick Rethans 12:11 I think it's something that extension authors should look at in any case, I mean, we have this lovely upgrading dot internals file, where this certainly should fit in as well in that case, I suppose. George Peter Banyard 12:22 Yeah, it'll fit in. It's currently not that big as a file that usually gets big, a bit before feature freeze because all the changes, land then. Derick Rethans 12:30 I know how this goes. This is also exactly the next debug starts breaking again because of API changes. So far I have been lucky there, so there's not been too many in PHP eight one. Do you know actually how much time there is until feature freeze? George Peter Banyard 12:45 I would imagine it's end of July, as usual, that's the usual timeline. I don't know because RM selection hasn't happened yet, so I don't know how long that usually takes. Derick Rethans 12:54 You're talking about RM for release manager selection here. Once this happens all hope to talk to the new release managers as well, and get them to introduce themselves here. George Peter Banyard 13:02 Seems like a good idea. Derick Rethans 13:03 To chats about any favourite things for PHP eight one. All right, George, thank you for taking the time this afternoon to talk to me about another tweak to PHP's handling of numbers in general, and I'm sure it won't be the last one. George Peter Banyard 13:19 Thanks for having me, and I'll talk to you soon. Derick Rethans 13:22 Hopefully in a pub with a pint. George Peter Banyard 13:24 Yeah, that would be nice. Derick Rethans 13:27 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Deprecate implicit non-integer-compatible float to int conversions RFC: Saner Numeric Strings Episode #62: Saner Numeric Strings Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 82: Auto-Capturing Multi-Statement Closures
PHP Internals News: Episode 82: Auto-Capturing Multi-Statement Closures London, UK Thursday, April 22nd 2021, 09:10 BST In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Nuno Maduro (Twitter, GitHub, Blog) about the "Auto-Capturing Multi-Statement Closures" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 82. Today I'm talking with Nuno Maduro and Larry Garfield. Nuno, would you please introduce yourself? Nuno Maduro 0:30 Hi PHP developers. My name is Nuno Maduro, and I am software engineer at Laravel, the company that owns the Laravel framework, and I have created multiple open source projects for the PHP community, such as Pest PHP, Laravel zero, collusion and more. Derick Rethans 0:48 Alright, and Larry, could you please follow up on that. Larry Garfield 0:51 Hello world, so I'm Larry Garfield. You may know me from several past podcasts here, various work in the PHP fig, and all around gadfly and nudge of the PHP community. Derick Rethans 1:03 Good to have you again Larry and good to have you here today, Nuno. The RFC, that we're talking about here today is to do with closures, and the title of the RFC is auto capturing multi statement closures, which is quite a mouthful. Can one of you explain what this RFC is about? Nuno Maduro 1:20 As you said, the RFC title is indeed auto capturing multi statement closures. But to make it simple, we are really talking about adding multi line support to the one line arrow functions that got introduced it, in PHP 7.4. Now, this new multi line arrow functions have exactly the same features as the one line arrow functions, so they are anonymous, locally available functions; variables are auto captured lexically meaning that you don't actually need the use keyword to manually import the use of variables, they just get auto captured from the outer scope. And the only difference really is one line arrow functions have a body with a single expression. This RFC actually allows you to use a full statement list that possibly ends with a return. Derick Rethans 2:18 Excellent, what the syntax that you're proposing here? Nuno Maduro 2:22 Well, as you may know, one line arrow functions have the syntax, which is fn, parameter list, and then that arrow expression thing, and this new RFC proposes that, optionally, developers can pass a curly brackets with statements, instead of having that arrow expression syntax. Now, this curly brackets with statements, simply denotes a statement list that potentially ends with a return. Concerning the Auto Capture syntax, we will be just reusing the Auto Capture syntax and feature that already exists on one line arrow functions, meaning that you don't need the use keyword to manually import variables. And of course, the syntax itself, in the in the feature, works the exactly same way. Concerning the syntax, it's also important to mention that this RFC was done in combination with the short functions RFC from Larry, but I think I'm going to let Larry speak about that later on this episode. Derick Rethans 3:26 What's the main idea behind wanting to introduce this auto capture multi statement closures. Because from what I understand the arrow is now gone, and it's been replaced by just a function body within curly braces. But why would you want to extend a single expression to like multiple statements? Nuno Maduro 3:44 Well, we all know that long closures in PHP can be quite verbose, even when you want to perform a simple operation. And this is largely due to the syntax syntactic boilerplate that you need to use when using long closures to manually import all the variables with the use keyword. Now, while one line arrow functions solve this problem to some extent, there are also a few cases that you might want to leverage the simplicity of auto capturing, but using two or three lines in a statement list. And one example I can think of is that when you are within a class method with multiple arguments on it, you might want to just return a closure, using all the arguments of that method, and actually using the use keyword in least all of the arguments is in this case, redundant and even pointless. It is also some use cases for example with array filter or similar functions, where they use keyword just adds some visual complexity to the code. We believe that the massive majority of PHP developers are really going to appreciate this RFC, as it makes just the code more simpler and shorter. The community loved these changes really, as proven on property promotions, one line arrow functions, or even the null safe operator. Derick Rethans 5:10 And I think this is something that the PHP language itself is moving forwards to, right. I see in the RFC that you're, you're trying to make sure that syntax stays consistent with itself and you use a word called sigils here for some of these things. What were the important parts of making sure that the syntax stays the same? Larry Garfield 5:30 So this actually relates to the short functions RFC. So the short functions RFC, as we discussed in previous episode, I'm trying to make it possible to write single line expression named functions in a more compact way. And that is a different problem with different syntax than auto capturing closures, which is why we needed to work together on these, because we want to make sure that the syntax for the various different kinds of functions that PHP supports, are all consistent with each other, and this piece of syntax indicates this thing consistently, rather than: in some cases this syntax means this, and other cases that means this other thing. Didn't want to end up with that kind of mess. After the short functions RFC, one part of the feedback was, there are people working on auto capturing long closures. Y'all should work together and make sure that the syntax plays nicely. And, for various reasons I just sat on it for a while, before finally getting hold of Nuno earlier this year, and we got together and talked and made sure that what he was doing and what I was doing, the syntax complemented each other. We ended up with in our discussion and analysis is that you can have a function that is named or anonymous. It can have Auto Capture or not. And it can either be a single expression with no return statement because it just evaluates to that value, or it can be a list of statements, one of which could be returned, potentially multiple could be returned. And right now we have syntax in PHP to support three possible combinations. But there's actually eight possible ways to combine those two, those three variants. We looked at all right, we have three of the eight, which of the others makes sense to have, and the two that makes sense to have are: short functions named function, no closure, and I say an expression body is what my RFC does. And then, anonymous Auto Capture statement list, which is what Nuno's RFC does. That rounds out that list, and the other combinations, I'm not sure actually have a purpose. Technically could exist and this also then means, if in the future we wanted to add those, then we know exactly what the syntax is going to be for those and what they would mean it's all just following that established pattern, so it makes it really easy for people to learn and understand. So what we end up with, out of these two RFCs together, which can stand alone, and they're going to be put to votes separately, but as I said complement each other. If you have: something, double arrow, expression, that consistently throughout the language ends up meaning evaluates to this expression. Short lambda means: you know, this function evaluates to this expression, a named function, double arrow expression, array key double arrow expression, a match statements, and so on and so on and so on, double arrow always means evaluates to expression. And the key word function means declaring a function named or anonymous, that does not auto capture anything. The case of a named function, there's nothing to capture; in the case of a closure, you'd have the manual capture with a use statements exactly like we've had since 5.3. And the FN keyword means, this is a function that is going to Auto Capture. But oh fn statement list with curly braces. I know that means: Auto Capture, statement list, or keyword function with an arrow: I know that means not auto capturing expression, and all these combinations then just make sense and they're easy to learn and they fit together well. That's really what we were trying to make sure, that between these two RFCs we ended up with that consistent set of rules around the syntax that was not designed that way originally but plays out in that way, and we can now make sure it stays playing out in that way that is internally consistent, and therefore easy to learn, easy to document and so on. Derick Rethans 9:35 Because all the things that you just mentioned, they're already in place for existing syntax. Larry Garfield 9:40 Correct. And so we're just taking existing syntax that means a thing, and combining those existing pieces of syntax in a new way. In some cases that syntax didn't necessarily mean that deliberately, for example, the FN keyword on short lambdas, was not added for the purpose of declaring Auto Capture. It was added because we needed to have some kind of syntax there to keep the lexer happy, fine, but now that it's there, we can say: all right, that is going to now mean auto capturing function, because that's something consistent with the language as it is and the language as it evolves into. Derick Rethans 10:14 Is the intention of this new syntax to replace long closures? Larry Garfield 10:19 Not entirely. I suspect, a great many use cases of long closures could get away with then using the auto captured version, but there's no plan to remove the long closure version. Part because there are cases you do need the manual closure, particularly, it's still the only way to capture a variable by reference. All the other versions are by value, which 99% of the time is what you want, but in that other 1% If you do need to by reference, then you've still got the long version. Derick Rethans 10:48 The long version that uses the use keyword. Larry Garfield 10:51 Right, and then you're manually capturing things, are cases where you would want to, you know, use the same variable name inside and out but not refer to the same variable, so in that case you can use the long version, and then you don't have that collision. In practice however, the only languages I know of, that have explicit capture on closures are PHP and C++. As far as I know, every other language, including the other major scripting languages, Auto Capture. We're really the oddball out, and in practice, I think using Auto Capture is going to be fine. It's going to be easier, and we're not going to introduce any substantial amount of new bugs around it. The place where that might cause issues, is if you have a long function with a long anonymous function in the middle of it with Auto Capture and you can't keep track your variables, in which case you're already doing it wrong anyway, you should have shorter functions and shorter closures. A real use case for this is: I have a function that's a closure that's two or three lines long, because not everything in PHP can be an expression, sometimes it has to be a statement. So okay, this thing is going to be two lines long instead of one line long, but I don't want to have to convert to the super verbose long version and manually declare all of my imports, I just want to add a second line. And so this makes that use case, a lot easier and a lot more convenient. Derick Rethans 12:11 I remember when discussing the match operator, which I think I also spoke to you about? Larry Garfield 12:17 Match is the one you spoke to yourself about. Derick Rethans 12:19 That's what it was, yes. Larry Garfield 12:20 It was a fun episode. Derick Rethans 12:22 When I discussed the match operator with myself, I think I looked at whether it was possible to extend a match expression with having multiple statements on the right hand side as well, where it is currently: it's the arrow with a similar expression. Is this something that you'd be looking at to tie this auto caption closure style into as well? Larry Garfield 12:42 It's a related issue that has been discussed mainly around match, around supporting multi line expressions. And that'll be some kind of syntax which hasn't been defined yet to list a series of statements, which can then be wrapped up together and have a final statement that is returned, and then the whole thing gets evaluated, and can be used in place of an expression like in a match statement or a function body. If we had a syntax for multi line expressions, that would be an alternative way to get to the same place this RFC gets to, because you could say: FN, parameters, double arrow, multi line expression, with whatever syntax that ends up being. And that gets you essentially the same thing at the end of the day. Is that good or bad, I'm kind of torn on it. What we point out in the RFC, is this syntax for auto capturing multi line closures, gives us a kind of roundabout way to put a multi line body into a match arm, where what you, the single expression, that the match arm evaluates to, is a multi line closure with Auto Capture that you then immediately self execute. The syntax for that is a little bit quirky. It looks kind of like older style JavaScript. We have parenthesis, function definition, closed parenthesis, open paren, close paren, so it just executes immediately. It's not ideal for that use case. Personally I think multi line match expressions, or multi line match arms, are a rare enough need that on the rare occasion you need it, this would be good enough. And if it's not good enough, you really should break that logic out to a separate function anyway and just call that. Not everyone agrees with that. So that's more a more of an interesting side effect of this RFC than a goal per se. One have to use it in that fashion I probably not will not use that in that fashion, very often, but it's, we now have a solution for that use case, if you actually have that use case, and we don't need any dedicated syntax for that. Derick Rethans 14:46 That could be part of a future RFC if people still feel inclined, that they need that. Talking about things in the future, is there any future scope with this RFC as well? Nuno Maduro 14:57 There is really nothing planned on this RFC is future scope. Yet, there is something that I will like to personally explore in the future, that now we have this fn keyword, that means Auto Capture, or access to the outer scope. And I think something would be very cool, is to explore named functions in the way that they are declared globally, with something like fn get name, and then that function would be able to access the outer scope, but again this is something that personally I would like to explore but it's not included in this RFC. I just plan to explore this in the future now that we have this possible combinations that Larry just explained. Derick Rethans 15:43 It's always interesting to see what people think of when you post RFC to the mailing list. What sort of where the biggest arguments against introducing this new syntax? Larry Garfield 15:54 It's interesting. For both of these RFCs both my short functions, and now the Auto Capture multi line, the feedback from the public community on Reddit, on Twitter and so forth has been extremely positive people love: Oh, I can write less syntax and get stuff. It's been not universally but overwhelmingly positive. The feedback on the mailing list has been decidedly mixed with some people saying, cool this you know I've been waiting for that, and others saying: Why? Been push back: If your capture statement is that complex you, you're doing it wrong anyway. Or, if you do have Auto Capture, rather than explicit, your odds of capturing something you don't intend to are higher. And so you can introduce weird bugs that way. Derick Rethans 16:42 Which aren't really arguments against having a multi line out to capture closure, with two or three statements. I mean if you're putting 50 lines in there then sure, you can make that argument I guess. Larry Garfield 16:53 Exactly, and that's kind of our response is: if you have a complex enough piece of code that Auto Capture becomes problematic. You have a complex enough piece of code that you really should be manually capturing, or just refactor your code so you don't have that much complexity. That since it kind of becomes a good indicator of need to refactor. Then there's always the argument of: why should you add more syntax for anything, you know, we've got one syntax let that rule everything and that's that comes up with every RFC. Points are valid, to an extent, but I think the convenience factor of being able to write code more naturally with less effort that does stuff that right now is just clunky, is a stronger argument, especially given that most other languages don't have manual capture and get along fine. People have mentioned JavaScript as an example where the Auto Capture used to be highly problematic, then resolved with an extra keywords you can declare a variable inside a closure with let, that is then locally scoped and overrides anything in a parent scope. I don't think PHP needs that in part because we don't use closures as overwhelmingly as JavaScript does, and honestly that problem has kind of gone away in JavaScript, as they've introduced real classes, and other more traditional techniques that obviate the need for those kind of closure inside closure inside closure inside the closure nonsense. Python doesn't actually have multi line lambdas as far as I'm aware, because they have named functions that are scoped local. Ruby, as far as I know just does Auto Capture and doesn't have any special syntax around it. So, I have not heard of them having any problems. As I said, C++ has manual capture and that's the only one I can think of that has it. I think looking at other languages, the problems people have pointed out are more hypothetical than real, and I'm hoping that, you know, voters on the list will see all right. This makes life easier and the problems with it are hypothetical, not real problems that we've seen in the wild. So let's just make life easier for people. Derick Rethans 19:05 Is there any chance of this breaking BC, somehow? Larry Garfield 19:08 It shouldn't, the syntax right now would be syntax error. I don't see any, any BC breaks possible. Derick Rethans 19:16 That's always a good thing isn't it. Larry Garfield 19:18 Yes. Derick Rethans 19:19 You were talking about appealing to the voters on the mailing list, which have the right to vote on features usually. When do you think you will be putting this up for a vote? Larry Garfield 19:29 Probably around the end of April. We can put probably both RFCs up for a vote, you know, let the chips fall where they may. As you said both RFCs are stand-alone. If one pass and the other fails everything still works. Obviously we both like both the pass. Derick Rethans 19:43 And with both you mean: both this RFC, which is the output capturing multi statement closures RFC, as well as the short functions RFC that we spoke about in episode 69. Larry Garfield 19:53 Correct. Derick Rethans 19:54 Thank you for taking the time today to talk to me about the new RFC that you're proposing. Larry Garfield 20:00 Thank you, Derick always good to talk. Nuno Maduro 20:01 Yeah, thank you so much for having me. Derick Rethans 20:06 Thank you for listening to this instalment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patron. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Auto-Capturing Multi-Statement Closures Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 81: noreturn type
PHP Internals News: Episode 81: noreturn type London, UK Thursday, April 15th 2021, 09:09 BST In this episode of "PHP Internals News" I chat with Matthew Brown (Twitter) and Ondřej Mirtes (Twitter) about the "noreturn type" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 81. Today I'm talking with Matt Brown, the author of Psalm and Ondřej Mirtes, the author of PHPStan, about an RFC that I propose to alter the noreturn type. Matt, would you please introduce yourself? Matthew Brown 0:37 Hi, I'm Matthew Brown, Matt, I live in New York, I'm from the UK. I work at a company called Vimeo, and I've been working with for the past six years on a static analysis tool called Psalm, which is my primary entry into the PHP world, and I, along with Ondřej authored this noreturn RFC. Derick Rethans 1:01 Alright Ondřej, would you please introduce yourself too? Ondřej Mirtes 1:04 Okay, I'm Ondřej Mirtes, and I'm from the Czech Republic, and I currently live in Prague or on the suburbs of Prague, and I've been developing software in PHP for about 15 years now. I've also been speaking at international conferences for the past five years before the world was still alright. In 2016, I released PHPStan, open source static analyser focused on finding bugs in PHP code basis. And somehow, I found a way to make a living doing that so now I'm full time open source developer, and also father to two little boys. Derick Rethans 1:35 Glad to have you both here. We're talking about something that clearly is going to play together with static analysers. Hence, I found this quite interesting to see to have two competitive projects, or are the competitive, or are the cooperative. Matthew Brown 1:56 I think half and half. Derick Rethans 1:57 Half and half. Okay. Ondřej Mirtes 1:59 Competition is a weird concept in open source where everything is released for free here that Derick Rethans 2:04 That's certainly true, but you said you're making your living out of it now so maybe there was something going on that I'm not aware of. In any case, we should probably chat about the RFC itself. What's the reason why you're wanting to add to the noreturn type? Ondřej Mirtes 2:18 I'm going to start with a little bit of a detour, because in recent PHP development, it has been a trend to add the abilities to express various types natively, in in the language syntax. These types, always originally appeared in PHP docs for documentation reasons, IDE auto completion, and later were also used, and were being related with static analysis tools. This trend of moving PHP doc types tonight this type started probably with PHP seven that added scalar type hint. PHP 7.1 added void, and nullable type hints, 7.2 added object type, 7.4 added typed properties. And finally, PHP, 8.0 added union types. Right now to PHP community, most likely waits for someone to implement the generics and intersection types, which are also widely adopted in PHP docs, but there's also a noreturn, a little bit more subtle concept, that would also benefit from being in the language. It marks functions and methods that always throw an exception, or always exit, or enter an infinite loop. Calling such function or method guarantees that nothing will be executed after it. This is useful for static analysis, because we can use it for type inference. I have an example, when you're accepting nullable object as a function parameter, you probably want to eliminate the null value before you can safely call a method on it. So, you will write if $object, three equal signs null, somehow handle this situation, and at the end of the if statement, you will return, or throw an exception. But instead of return, or throw, you might choose to call framework specific or a library specific function, that also always throws or exits the process. This will also tell the user, the IDE, and the static analyser, that below the if statement, the variable can no longer be null. For example, if you ever called mark test skipped in PHP unit, or if you call the abort function in Laravel, you've already used the function that would benefit from being marked with noreturn keyword. Derick Rethans 4:24 You mentioned that currently people use the docblock no it @noreturn for that. Why would it be better to have it in the language? Matthew Brown 4:31 Jumping off, Ondřej's point. PHP has this has this thing, right, you know things where the doc block, but PHP also, it's a language where developers are used to the language telling them if they did something wrong. So whereas other languages, you might need, like for example, JavaScript, they can be a bit more permissive. Developers when they write PHP code, they're used to getting errors instantly. They call a function with an object instead of a string, and expects a string, and it's marked in the signature as expecting a string, when they run that they get an error. And so that's just a kind of way that most PHP developers write cod. With a noreturn type, we sort of thought that there's a benefit to developer, having written a noreturn type, instantly getting an error if they actually do something that returns. So it follows that pattern that PHP has adopted, of, if I do something that violates a type that I've annotate, that I've explicitly added to the function, PHP should error. There's also a useful sort of side effect here, which is that when you add noreturn to a function, it's guaranteed that it will never return the context. If you call it, it will never not return because it will either whenever not throw an exception or exit, because if the noreturn is invalid, if it does actually do something where it's returning somehow, PHP will then throw a Type error. Cause it's supported by the language. If it wasn't supported by the language, you'd be able to use a function that called noreturn, and it wouldn't actually return. I mean obviously Ondřej and I are big fans of static analysis. The language itself isn't just to pat ourselves on the back and think, you know, we had the right idea when we were doing static analysis, it's because it can help PHP developers write code. Derick Rethans 6:16 The void return type can only being used in specific locations, I mean you can't type a property for examples void. Are the similar restrictions on where noreturn can be used? Ondřej Mirtes 6:27 Yeah, right now it can be used just as a return type. There might be some other possible usages, but they are not part of this RFC. For example the noreturn bottom type could be used as a parameter type to denote a function that shouldn't be called. So, this might be some relevant use case, but I've already had a feature request for PHP Stan, to actually support this type as a parameter type for callbacks that should never be called, but I don't remember why that person wanted this. Once we have generics, or at least the possibility to type what's in an array, we could also use the no return type for that. For example, array that contains noreturn, or never, would mean that the array is empty. And also during static analysis, the type inference engine also produces this type internally, basically to mark dead code. So for example if you ask better variable that can only ever contain an integer, if that variable can be a string, you're creating a condition that cannot be executed, that will be always false, and the resulting type of the variable inside that condition is the same type as noreturn or never. Derick Rethans 7:41 You mentioned never there we haven't spoken about yet, but we'll get back to that in a moment I'm sure. Is there any prior art for this? Matthew Brown 7:47 Yes, a number of languages have a noreturn type. Hack has specifically a noreturn type, Hack, if anyone listening doesn't know, hack is a language created as a sort of offshoot of PHP. Engineers at Facebook, when they were running into issues with PHP from about the moment they started using it in 2007/2008 as the site started growing, and performance really became an issue. And so eventually they created their own version, basically. And one of the benefits of working at Facebook is, you have lots and lots of smart engineers, and they added a lot of different typing functionality to this new language. And so one of the things I added was a noreturn type, as well as adding generics and many, many other things. Another language with prior art is type script. TypeScript has a never type, which is essentially the same. It's a bottom type as Ondřej was talking about. And a bottom type is the subtype of all subtypes. You have a class structure, you have exception, and then you have a child class of logic exception, and noreturn, is the subclass of subclasses of the child class, the thing right at the bottom of the type hierarchy, and so it can always be returned when you would expect some other thing. But basically, this is the understanding of what a bottom type is. I talked about interpreted languages to interpreted languages, but also many compiled languages, most recently Rust, that have the notion of a bottom type. It's a type, where you're guaranteed that program execution ends, in some way shape or form. Derick Rethans 9:23 You mentioned that noreturn is the bottom type, how does that play with variance that PHP implements? Matthew Brown 9:32 The concept of variance for return types is essentially, if a parent method returns something like a, an exception class, the child classes can either return an exception class, or they can return children for that same method of the exception. So let's say I have a method getException, that is described as returning an exception, the child methods in our child class, so child::getException can either return an exception, or they can also say they return a child class, so they can say, I actually return a logical exception, and this is valid according to Liskov substitution principle, which is to say: you're allowed to return a child type of whatever the overridden method was. So where this comes into play with noreturn is, noreturn is defined as is the bottom type is at the very bottom of all those class structures, you can always return a bottom type, basically And this makes sense if we just think about it, you're not breaking a contract, if your function always returns or exits; the variance rule to kind of follow that. Derick Rethans 10:43 How would that compare with void? Because void has some interesting variance rules as well right? Ondřej Mirtes 10:49 Actually, no or little similarities between void, and noreturn. Because when you are calling a void function or method, you expect it to do something, and then actually continue in the execution flow. Not expect to read the return value, but with noreturn, you call a method, and you don't expect it, the execution flow to continue. These are completely different, and I actually don't know how people can mistake one for the other. Derick Rethans 11:22 Yes, seems very, very different to me as well. The RFC talks about alternative ways of introducing noreturn. And one of the things that had mentioned, is using the attribute. Attributes, being introduced in PHP 8.0. Why did you decide not to implement it as an attribute or suggested as an attribute instead? Matthew Brown 11:43 Attributes I think are really cool. I think attributes have a place in the language, obviously they have a place as the RFC described, in place a docblocks, they can be reflected very quickly at runtime. And I also I'm interested in ideas like a deprecated attributes. And also I've just been kind of toying around in my head, the idea of a pure attribute, which could guarantee at runtime that a function with that attribute, was pure. It would never, for example, use a property, or it would never use like a static variable. We could guarantee purity of functions which would interest the pure functional programming people Derick Rethans 12:26 Could you explain what a pure function is? Matthew Brown 12:28 A pure function is a function that doesn't use any other data but the data you provide it. If I have a multiply function that takes two parameters, x and y, and it returns the multiplication of those things, you would call that function pure. There are many ways the function can become impure. One of the ways is it can have output, you can have IO output for example so if the body of the function you then echo the value of x, before returning, that function becomes impure because it's changed the environment that it operated in slightly. Additionally, if you metal memorize the value of x. So let's say you have x and y as inputs, and then in the first line, you take the value of a property elsewhere, and you add x to that value, and then you multiply that result, then that function is also impure, you're using data from outside the function to return this value. So the idea of a pure function is one which essentially can be modelled mathematically, and that's why some kind of purists, like the this idea because it allows things to be modelled mathematically, but more importantly, then it allows those functions to be tested very effectively. Some implements purity, so that you can add a docblock annotation to function and it will tell you that, whether or not the function is pure. This has extra benefits when writing really complex code. So the most complex code that Psalm has, which performs some boring computation, I've added these pure annotations everywhere. And what it does, it forces me to write the code in a way that avoid side effects. The hope is from my end that writing this very complicated code in a pure fashion, makes it easier to debug at some later date. Derick Rethans 14:20 Thanks for that. Matthew Brown 14:21 I think attributes are great and have these uses. I don't believe that attributes are useful to encode types, because PHP has a place where you can already represent types, you know, we've introduced into the language itself, the notion of typing, you know obviously many years ago. I think there is a benefit to where possible, keeping the types as types. There was a suggestion that noreturn could be an attribute instead, because it in some way it's it's really about behaviour. But it's still a type, and in the wider programming community, there is prior art for it to be considered a type. So there's basically no benefit to my mind so making it an attribute. And as well, the implementation as a type is very small, you know, it's less than, well under 100 lines of actual written PHP to implement this feature because it uses the existing checks that we already use. We also use for other return types, and to make an attribute we kind of take it out and very much expand the implementation. There are two good reasons there to not want to use an attribute. Ondřej Mirtes 15:31 There are not very useful combinations. If noreturn was an attribute, then what would you write as a return type. There are not many useful combinations of what it could be. Derick Rethans 15:44 And it also can't be used with any kind of variance rules any more. Matthew Brown 15:48 Or at least if it were to be used for variance rules then we would have to write that logic. You'd be like why are we writing this logic in this particular way, it wouldn't make sense. Derick Rethans 15:57 Because noreturn is a type, and not a behavioural thing. Makes perfect sense. Matthew Brown 16:03 But it's both a type and a behaviour. In the same way that when you actually say, this function returns a thing, PHP then does a behavioural or check to make sure that that function always returns. You could argue that every type is essentially a behaviour, because you're saying the behaviour of this function has to return a value. Derick Rethans 16:21 Earlier, one of you mentioned instead of noreturn, the never keyword. Is that the only alternative that that was discussed are the further ones? Ondřej Mirtes 16:32 Well there's noreturn and never and the RFC is now going through the voting process, so the secondary vote is about the name, and some languages also use nothing. It feels more natural to say that that function never returns, or using the noreturn keyword, then, saying that it returns nothing which blends closer to the void, void keyword Derick Rethans 16:58 Earlier you were mentioning that for future scope you wanted to use this new keyword that you're suggested to introduce also in all the locations where perhaps noreturn does not make sense. Ondřej Mirtes 17:08 Yes. Also. What no return has going for it is that it's unlikely to be used as a class name somewhere so making it, whereas if key word isn't an issue, but just as you said, it looks like a key word for a single purpose being written in a return type thing that it's quite obvious which one of us two like which keyword, because I like never more. And one reason is that it's a single word and it reads more naturally in the source code, and it's also looks more like a full fledged type and TypeScript, uses the same keyword. Derick Rethans 17:42 Why did you put noreturn in the RFC? Ondřej Mirtes 17:44 Because Matt likes it more. Matthew Brown 17:47 Yeah, I wrote the first draft of the RFC, I got first dibs, but this is a big point of contention with Ondřej and I, and we're almost at the point of not speaking to each other, because I'm on one side and he's on the other. And it looks at the moment like never will succeed. I think the TypeScript thing is a good point. When I wrote the RFC originally, I wasn't thinking that so many PHP developers write TypeScript. I hadn't really factored into my head. And I think, given that it does make more sense that never is used. Derick Rethans 18:21 Looking at how recurrent voting is going, never has 32 votes going for it, and no return has 14 votes going for it. Ondřej Mirtes 18:29 Just kidding. I can't wait to have a beer with him again, once the world is, is fine again. Derick Rethans 18:35 Me as well. Matthew Brown 18:36 He can't start inventing new words; like yeah ironically naming is hard right. Derick Rethans 18:41 Definitely the case. At the moment it's very clearly looks like that, the new keyword is going to be never, with 40 votes for introducing a keyword to begin with and 10 against, so that looks like a done deal. Would either of you have anything else to add? Ondřej Mirtes 18:57 Yeah, Derick, last time I refresh the wiki, I noticed that you haven't voted yet so what is going to be your vote? Derick Rethans 19:04 I intend not to vote until I've spoken to the people on the podcast. Matthew Brown 19:09 Great, great. Derick Rethans 19:10 I will make sure to vote. Having said that, thank you very much for taking the time today to talk to me about this RFC. Matthew Brown 19:17 Thank you. It was a pleasure. Ondřej Mirtes 19:19 Yeah, I've been following this podcast closer since the beginning, so I'm happy I was able to finally join, and have something to talk about here. Thank you. Derick Rethans 19:26 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: noreturn keyword Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 80: Static Variables in Inherited Methods
PHP Internals News: Episode 80: Static Variables in Inherited Methods London, UK Thursday, April 1st 2021, 09:08 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Static Variables in Inherited Methods" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick, welcome to PHP internals news, the podcast dedicated to explain the latest developments in the PHP language. This is episode 80. In this episode I speak with Nikita Popov again about another RFC that he's proposing. Nikita, how are you doing today? Nikita Popov 0:30 I'm still doing fine. Derick Rethans 0:33 Well, that is glad to hear. So the reason why you saying, I'm still doing fine, is of course because we basically recording two podcast episodes just behind each other. Nikita Popov 0:41 That's true. Derick Rethans 0:42 If you'd be doing fine 30 minutes ago and bad now, something bad must have happened and that is of course no fun. In any case, shall we take the second RFC then, which is titled static variables in inherited methods. Can you explain what is RFC is meant to improve? Nikita Popov 1:00 I'm not sure what this meant to improve, it's more like trying to fix a bug, I will say. This is a really, like, technical RFC for an edge case of an edge case, so I should say first, when I'm saying static variables, I'm not talking about static properties, which is what most people use, but static variables inside functions. What static variables do unlike normal variables, is that they persist across function calls. For example, you can have a counter static $i equals zero, and then increment it, and then each time we call the function, it gets incremented each time and the value from the previous call is retained. So that's just the context of what we're talking about. Derick Rethans 1:43 Why would people make use of static variables? Nikita Popov 1:46 I think one of the most common use cases is memoization. Derick Rethans 1:50 Can you explain what that is? Nikita Popov 1:51 If you have a function that that computes some kind of expensive result, but which is always the same, then you can compute it only once and store it inside the static variable, and then return it. Maybe possibly keyed by by the function arguments, but that's the general idea. And this also works if it's a free standing function. So if it's not in the method where you could store state inside the static property or similar, but also works inside a non method function. Derick Rethans 2:22 The keyword here in his RFC's title is inherited methods, I suppose. What happens currently there? Nikita Popov 2:29 There are a couple of issues in that area. The key part is first: How do static variables interact with methods at all? And the second part is how it interacts with inheritance. So first if you have an instance method, with a static variable, then some people expect that actually each object instance gets a separate static variable. This is not the case. The static variables are really bound to functions or methods, they do not depend on the object instance. Second problem is: What happens with inheritance? So you have a parent class with a method using static variables and then you have a child class that inherits this method. There are two ways you can interpret this, either this is still the same method, so it should use the same variables, or you could say okay the inherited method is actually a distinct method and should use separate variables. PHP currently follows the second interpretation. Derick Rethans 3:24 And is this even the case, if it's overridden, or just when it's inherited? Because there's a difference there supposed as well. Nikita Popov 3:30 Yeah, this is the basic model that PHP tries to follow but there are quite a few edge cases. The one that's what you mentioned if you override the method, then of course, you're calling the overridden method so the static variables don't even come in. But if you then call the parent method, then usually you would expect, if you do an override and just call the parent that the behaviour is exactly the same as if you didn't override it at all. That's not the case here, because now you're calling the parent method. So the problem you have here is that if you didn't override the method, then the child method and the parent method would have different static variables. Now if you call parent, then you're just calling the parent method, so you get back to one set of static variables, which are the same for both methods. You can see they're the same for both methods, but rather because you're calling the parent method, there is only one method involved, only one set of static variables involved. You can't really just like seamlessly extend a method that uses static variables, without changing the behaviour by accident. Derick Rethans 4:37 This sounds all very complicated. Nikita Popov 4:39 Yes I said, I did warn you that this is an edge case of an edge case. Derick Rethans 4:44 But I think the whole idea behind the RFC is to make it less complicated. Nikita Popov 4:48 Yes, this is like not the, the only issue you can run into. There is another one that we have actually addressed separately, but which still exists on earlier PHP versions, which is that the value of the static variables depend on the time of inheritance. Let me be a bit more explicit there. What we had in previous versions is that half your parent method was a static variables, then you call that parent method, static variables change, then you inherit it. And again, call the inherited method. In that order: first declare the parents, call it, declare the child, call it. In that case we will actually take the static variables at the time, where the inheritance actually happens. The first call onto the parent method modify the static variables, then we will use some modified variables. From that point on, it will have a separate copy, the child method, but it will like pick up these original modifications before inheritance happens. Now in PHP 8.1, we actually already fixed that so that we always use the original values, but this is like just one more thing to the list of weird things that happen, if you use static variables inside methods and you inherit them. Derick Rethans 6:06 I think I understand more of it now. Nikita Popov 6:08 I think you understand more than you ever wanted to know. Derick Rethans 6:12 You've mentioned the edge cases. What is the result, going to be once this RFC passes, which I'm going to think is quite likely Nikita Popov 6:21 The result is, hopefully, simpler than what we have, namely that static variables are really bound to a specific function or method declaration. If you have one method using static variables, then you have only one set of static variables ever. If it's inherited, you still reuse the same static variables because there is no separate inherited method. It's just the same method in the child class. That's the concept. Derick Rethans 6:52 And if you override it in an inherited class? Nikita Popov 6:55 If you override it, and you call the parent method, then the behaviour is unchanged because you still have just a single set of static variables, so there is no edge case here any more because the child, the child method never had a separate set. Derick Rethans 7:10 But if the overridden method also defines its own static variable with the same name? Nikita Popov 7:17 That's possible. In that case, once again this rule is that each method has its own static variables and methods can have static variables and the child method can have them as well, if they are overridden and there are no name clashes between them. Derick Rethans 7:32 Because they are going to be totally separated, meaning that any code you run in the inherited methods will only affect its static variables, and any code that runs in the original methods only affects the static variables that are bound to that specific method. Nikita Popov 7:50 Exactly. I mean, in the end, static variables are really the type of global state, just a type that is kind of isolated to a specific namespace and doesn't cause clashes, so in that sense, it's important that these things are isolated. Derick Rethans 8:06 And that would also make the behaviour, a lot more easier to explain than it currently is. Because every methods, has its own set of static variables. Nikita Popov 8:15 Yes. Derick Rethans 8:15 Or I should say, every declared methods, has its own set of static variables. Nikita Popov 8:20 I guess that is an important distinction. If you can see the methods inside your code and see the static variable inside it, then that is a distinct one. If we ignore the exception of traits. Derick Rethans 8:34 You're going to have to explain that as well. Nikita Popov 8:37 Well traits are always a special snowflake. Our general model for traits is that they are compiler assisted copy and paste. So a trait should roughly behave as if you just copied all the methods into the class that's using the trait. And in that sense, if you are actually copying the code of your method with a static variables, then it should also use a distinct set of static variables for each use. And that is also how it is proposed to behave. So that is like the one exception where you have a single method declaration in your code, but each using class will get a separate set of static variables. Derick Rethans 9:15 Because the code is copied in place, instead of linked, or used in place. It's also the case for all the methods declared in traits, they're also copied into the same symbol table as the methods belong to a class. Nikita Popov 9:32 Yeah, that's right. Derick Rethans 9:33 Should be reference counted in some way because you probably won't duplicate the exact data. Nikita Popov 9:38 We of course don't actually copy the methods, or at least most of the methods, but from the programmer perspective that's how it works Derick Rethans 9:46 Why do you say most of the methods, and not all the methods? Nikita Popov 9:50 We separate two things there. There is the method itself. So the op-array, and then there is all the stuff it uses like the opcodes the like arguments information and so on. What they do for traits, is we share all the data, and only create a separate op-array. Reason is that there are some differences. For example, we have to adjust the scope, we have to adjust possibly the function name if aliases are involved, and we have to adjust the static variables. So it's like kind of a partial copy we do. Derick Rethans 10:22 Which is probably the most efficient way of doing it? Nikita Popov 10:24 Yes. Derick Rethans 10:25 Because this RFC is changing behaviour due to bug fixes, I would probably argue, what kind of backwards compatibility issues are there? And have you looked at how much code that actually impacts? Nikita Popov 10:39 I haven't looked how much code it impacts because this seems like pretty hard to really analyse. I mean I guess something we could easily check is how much static variables are used in methods at all. But it would be hard to distinguish whether this change. I mean how to distinguish in a completely automated way. Whether this change makes behavioural difference for a particular use case or not. So I can't really give information on that, though I would expect that impact is relatively low because the common use cases, things like memoization, they aren't affected by it, or they are only affected by it in the sense that: Then you will memoize a value only once for the whole class hierarchy instead of memoizing it once for each like inherited class. Derick Rethans 11:32 So, it's going to improve the situation there as well, is pretty much what you're saying? Nikita Popov 11:36 Yeah, but I'm sure there are also cases where the previous behaviour was like intentionally used. I mean it was never documented, but you know if some behaviour exists, people will always make use of it in the end, but I can't really say exactly how much impact this would have. Derick Rethans 11:55 Do you have anything else to add, discussing this RFC? Nikita Popov 11:59 No, I think that's it. Derick Rethans 12:00 Then I would like to thank you for taking the time today, again, to talk to me about static variables in inherited methods. Nikita Popov 12:08 Thanks for having me once again. Derick Rethans 12:16 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Static Variables in Inherited Methods Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 79: New in Initialisers
PHP Internals News: Episode 79: New in Initialisers London, UK Thursday, March 25th 2021, 09:07 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "New in Initialisers" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. As you might have noticed, the podcasts are currently not coming out once every week, as there are not enough RFCs being submitted for weekly episodes. I suspect that this will change soon again. This is episode 79. In this episode, I speak with Nikita Popov, about a few more language additions that he's proposing. Nikita, how are you doing today? Nikita Popov 0:43 I'm doing well, Derick, how are you doing? Derick Rethans 0:45 I'm pretty good as though, I always much happier when it's sunny outside. Nikita Popov 0:48 Yeah, for us to weather also turned today. Yesterday it was still cold. Derick Rethans 0:53 We're here to talk about a few RFCs. The first one, titled "New in Initializers". What is this RFC about? Nikita Popov 1:00 The context is that PHP has basically two types of expressions: ones, the ones used on normal code, and the other one in a couple of special places. These special places include parameter default values, property default values, constants, static variable defaults, and finally attribute arguments. These don't accept arbitrary expressions but only a certain subset. So we call those usually constant expressions, even though they are not maybe constant in the most technical sense. The differences really that these are evaluated separately so they don't run on the normal PHP virtual machine. There is a separate evaluator that basically evaluates an abstract syntax tree directly. They are just like, have different technical underpinnings. Derick Rethans 1:49 Because it is possible to for example, define a default value to seven plus 12? Nikita Popov 1:54 Exactly. It's possible to define it to seven plus 12, but maybe not to seven plus variable A, or seven plus some function call or something like that. Derick Rethans 2:03 I guess the RFC is about changing this so that you can do things like this. What are you proposing to add? Nikita Popov 2:09 Yes, exactly. So my addition is a very small one, actually. I'm only allowing a single new thing and that's using new, so you can use new, whatever, as a parameter default, property default, and so on. Derick Rethans 2:23 In this new call, pretty much a constructor call for a class of course, can arguments to be dynamic, or do they need to be constant as well? Nikita Popov 2:33 Rules are always recursive, so you can like pass arguments to your constructor, but they also have to follow the usual rules. So again, they also have to be constant expressions, or after this RFC, they can also include new, but they cannot include variable references and so on. Derick Rethans 2:50 Is this something that is being defined at the grammar level or somewhere else? Nikita Popov 2:55 We actually used to define that at the grammar level back in PHP five, but nowadays we just accept all expressions, and then we print you a bit nicer error message if it turns out it's not supported in this context. Derick Rethans 3:08 And that happens when the AST is created. Nikita Popov 3:10 Yeah, exactly. Derick Rethans 3:11 The new syntax additions is to allow new in default values in places. The RFC talks a little bit about the evaluation order, and what is this and why is is important. Nikita Popov 3:23 This is really the critical part. Right now, the kinds of things you can use in constant expressions are well as the name says, these can supposed to be constant, and not dynamic. This is not entirely true because, for example, you can like reference a class constant and referencing a class means that you call an autoloader. And that can run arbitrary code. Or you can trigger a notice that triggers an error handler and that also runs arbitrary code, like more like at the design level at the conceptual level, these really are constant expressions that are not supposed to have any side effects. Allowing new changes that in the big way, because new calls constructors and constructors can do whatever they want. I think it would be unusual to print out a string in the constructor, but some people might want to create a database connection there. The question of where exactly these expressions get evaluated becomes more important now, so we have to think about that a bit more carefully. Previously this was never really formally specified, how the evaluation works. There are some cases where it's pretty clear cut, for example, parameter default value of course gets evaluated when you call the function, and that parameter hasn't been passed. Similarly if you define a global constant, then the value is evaluated when you define the constant. And then there are the problematic cases, that I'm actually still not sure what to do about. The problematic cases are class constants, and static properties. Because the way these work right now is that they are evaluated lazily, so when you declare the class, they are not evaluated, and then later if you use the class, at least most uses of the class, they will be evaluated. Derick Rethans 5:07 It would happened when you're done instantiate the class? Nikita Popov 5:09 For example, yes. If you instantiate the class, if you add a static property and so on. But for example, not if you extend the class. In cases that like might potentially meet the evaluated initializers. The problem here is that, um, of course, if now your expressions can have side effects, then it's not great if you don't have like hard guarantee when it's actually going to be evaluated. On the other hand, what I actually implemented already, but I'm not sure it's a good idea is to change it to evaluate the expressions equally, so when you declare the class, we immediately evaluate all the static properties and constants. Derick Rethans 5:47 I think I found a problem with that as well. For example, if one of the default values is doing new DateTime for example, if you rely on that happening when you instantiate the object, you will get a different time than when you declare a class. Nikita Popov 6:01 I probably should have mentioned that explicitly. So when it comes to properties we'll always evaluate when you create the object. If you do a new DateTime then you will always get a new DateTime for each object, otherwise it wouldn't really make sense. The problem with the, with evaluation order for the static properties is, that if we evaluate immediately when they declare the class, then we can run into issues with dependencies. If you're using auto loading it's usually not a problem, but if you like declare classes manually, then you might have one class on using a constant from a class that you declare later on. Right now, that works fine, because the initializers are evaluated lazily, but if we evaluate them immediately then this is going to throw a fatal error because it says okay the class hasn't been declared yet. That's a backwards compatibility break, and there are also some other issues, for example with preloading, where it's not really clear when exactly we should be evaluating things in that context. So this is a point I'm undecided on, when things should evaluate. If we should stick with the current lazy evaluation or make it easier or possibly just limit the RFC, to not allow new inside static properties and class constants, because, at least for me personally, those are not the main use case. The three things that seem most important to me are parameter default, property default, I mean non static property default, and usage in attribute arguments. Derick Rethans 7:29 When I was reading the RFC, I was realizing that if a constructor throws an exception, for example if it's a default argument to a method, then what would happen? Would the method fail, or, or would the method call fail or something else? Nikita Popov 7:45 It would behave basically the same way as if you initialize the method argument, inside the method, you could do like you would do right now like with a null check maybe. So you would just get an exception thrown inside the method, and it would fail. Derick Rethans 8:01 And is that the same if you would use it as the new new syntax as a default value to have constructor arguments as well? Nikita Popov 8:09 Yeah, I think there's a situation would be the same, the one special case is if you use it as a property default, and then the instantiation fails, then we treat this basically the same way as the constructor having failed, which is a special situation, a slightly special situation, because we will also not call the destructor in that case. So we say that the object has been incompletely constructed so it will not get destructed. Derick Rethans 8:38 And of course if this is a standard class property, then this happens on instantiation of the class. How would this work if it would be for a static property? Nikita Popov 8:50 For a static property, well that, that depends again on the whole question of evaluation order. So for example the way things work right now, like without this proposal, is for example if you have a static property and you're referencing it constant that doesn't exist. Then, when you try to use the class you get an exception that okay undefined constant whatever. If you try to use it again, you still get the same exception, so you get this exception every time you use a class. This is what would happen on this case as well. Derick Rethans 9:24 So it wouldn't happen on class declaration, but when you start using it? Nikita Popov 9:29 Depending on where we evaluate. Either only on use or the first time on declaration and then afterwards in each use. If you still try to use it despite the declaration having failed, which is an odd thing to do but you would have to counter it somehow. Derick Rethans 9:45 You know, if it is possible to do people will find a way how to do it. Nikita Popov 9:48 Yes, certainly. Derick Rethans 9:49 Can you talk a little bit about recursion protection as well because the RFC talks about that? Nikita Popov 9:54 Well that's another edge case. So if you create an, have an, for example Class A with a property that has an initializer new A. That means when you create an object of class A, and try to initialize it you have to create another object, and then another object, and another, and we have to detect that situation, or we do detect that situation and for nice exception instead of, resulting in a stack overflow. Derick Rethans 10:20 Which is beneficial. Nikita Popov 10:21 Yes, because most people do not know what to do when people went PHP throws a segmentation fault, so they do prefer exceptions, usually. Derick Rethans 10:30 I would too. The RC also talks about, there are some issues around traits which I didn't quite fully understand, would you mind explaining that to me? Nikita Popov 10:38 The issue here is that traits can have properties and or rules. A rule is that if you have two traits, used in the same class, and declaring the same property, they have to be compatible. And compatible means effectively they have to be exactly the same. So, same visibility and same default value. The trouble here is that if we are dealing with an instance property, which has a new expression as a default value, then we have to somehow check that these are the same. It would be not great if we actually had to evaluate the initializer to do that because, I mean it's okay if it's just you know, with initializer something like one plus two, but if it's an actual new expression we don't want to create objects which again might have side effects and so on. What I'm specifying is that if you have a trait property with this kind of dynamic initializer, so using the new expression, than we will always consider it not compatible. Derick Rethans 11:36 Would it currently be compatible with one of those trait properties, it says seven plus three, for example? Nikita Popov 11:42 That will be compatible, which is actually, I think relatively new thing. We used to not evaluate initializers and traits at all, and say those are incompatible and that changed at some point and seven point, I don't know which version. But in this case we would go back to saying it's incompatible, because at least I don't see a good way to make it compatible and I don't think it's particularly important to support that case. Derick Rethans 12:10 Do you have any information about how much traits are actually used? Nikita Popov 12:15 Well, I know that Laravel uses them. But I have no idea how much. Derick Rethans 12:22 One last thing I think RFC mentioned, is that it also has an effect on attributes, that it sort of gets nested attributes in by the back door. How does that work? Nikita Popov 12:33 I wouldn't call it the back door. Exactly. I have to be honest, I didn't think about attributes at all when writing this proposal, what I had in mind is mainly parameter defaults, and property defaults. But yeah, attribute arguments also use the same mechanism and are under the same limitations. So now you can use new as an attribute argument. And this can be used to effectively nest attributes, so the example I've seen from Symfony is that they have, for example, assertions. They have an assert all attribute which has the which accepts, which wants to accept a list of assertion attributes. And now you can actually do that because you can, um, create these attribute objects recursively. The example from the RFC is assert all, then new assert not null, new assert length max six. Derick Rethans 13:26 That's actually kind of neat, that is just ends up starting to work on right? Nikita Popov 13:30 Yeah, I mean, I read the thread for Symfony how they are trying to work around that. They have various ideas of how to do it and it's all pretty ugly. So I think it's nice to have a more or less proper solution for that. Derick Rethans 13:45 They'll just have to wait until PHP 8.1. Nikita Popov 13:48 Yes, that is the disadvantage. Derick Rethans 13:51 Out later this year. Derick Rethans 13:53 Are there any backwards incompatible changes? Nikita Popov 13:56 That again comes back to the evaluation order the problem. Originally I had intended to this, this to be compatible. Now if we change evaluation order then it is breaking, depending on that, the answer is yes or no, I am still not sure on that one. Derick Rethans 14:11 Because I think PHP eight one already has a breaking changes in there where the order of declaration of properties is now different. Nikita Popov 14:19 Yeah, that the change, though I hope that does not affect people too much because it's mostly about debugging functionality, which of course you are kind of interested in. Derick Rethans 14:29 Yep, it broke my tests, which is a good thing because it means that my tests cover all the edge cases as well. I think we sort of done discussing this RFC, is there anything else that might ends up being added here in the future, or what still needs to be hammered out before you can put it up to vote? Nikita Popov 14:47 Apart from the evaluation order question that I have been continuously mentioning, the future scope would be to extend this to not just new expressions, but also for example static method calls, popular alternative pattern is to not use constructors, but named constructors, which are implemented as static methods, and similarly also function calls for example so you can use something like strlen() or count inside an initializer. Derick Rethans 15:13 Isn't strlen a language construct now? Nikita Popov 15:15 No it isn't. It has an optimized implementation in the virtual machine, but it's still technically a normal function call. Derick Rethans 15:23 Because I remember that, breaking tests in Xdebug as well at some point, because it suddenly didn't suddenly was no longer a function call. Nikita Popov 15:30 Things do tend to break in Xdebug. Derick Rethans 15:34 Okay, I'm used to it. Thank you, Nikita for taking the time to talk about your new in initializers RFC. Nikita Popov 15:40 Thanks for having me. Derick Rethans 15:45 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supports of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: New in Initialisers Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 78: Moving the PHP Documentation to GIT
PHP Internals News: Episode 78: Moving the PHP Documentation to GIT London, UK Thursday, March 11th 2021, 09:06 GMT In this episode of "PHP Internals News" I chat with Andreas Heigl (Twitter, GitHub, Mastodon, Website) to follow up with his project to move the PHP Documentation project from SVN to GIT, which has now completed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 78. In this episode, I'm talking with Andreas Heigl about moving the PHP documentation to GIT. Andreas, would you please introduce yourself? Andreas Heigl 0:35 Hi yeah I'm Andreas, I'm working in Germany at a company doing PHP software development. I'm doing a lot of stuff in between, as well. And one of the things that I got annoyed, was always having to go through hilarious ways of contributing to the PHP documentation, every time I found an issue with that. So at one point in time, I thought why not move that to Git and, well, here we are. Derick Rethans 1:07 Here we are five years later, right? Because we already spoke about moving the documentation to GIT back in 2019 and Episode 28. But now it has finally happened, so I thought it'd be nice to catch up and see what actually has changed and how we ended up getting here. Where would you want to start. What was the hardest thing to sort out in the end? Andreas Heigl 1:27 Well the hardest thing in the end to sort out was people, as probably always in software development. The technical oddities and the technical bits and pieces were rather fast to solve. What really was taking a long time was, well for one thing, actually, consistently working on that. And on the other hand, chasing down people to actually get stuff done. Because one of the major things here was not the technical side but getting the bits and pieces of information together to get access to the different servers, to the infrastructure of the PHP ecosystem, and chasing down the people that want to help you is one thing, and then chasing down the people that actually can help you is a completely different one. That was for me the most challenging bit, getting actually, to know who can do what and getting, yeah in the end, getting access to the actual machines, the whole ecosystem is running on that was really heavy. Derick Rethans 2:34 The System Administration of PHP.net systems is very fragmented. There's some people having access to some machines and some other people having access to other machines and yea it sometimes takes some time to track down where are all these bits actually run. Andreas Heigl 2:51 One thing is getting tracking down, where the bits run, the other one is, there is an excellent documentation in the wiki, the PHP wiki, which in some parts is kind of outdated. The other thing is, if you don't actually address the different people themselves personally, it is like screaming into the void so you can can send an email to 15 different people that have access to a box, like someone needs to do this and this and this. And everyone kind of seems to think, yeah, someone else can do that. I just don't have the time at this point in time Things get delayed, so you're waiting for an answer for a week; you do some other stuff, so two weeks go into the lab four weeks go into the land, and suddenly two months have passed. You didn't receive an answer and oh wait a minute, there was this project with moving the documentation to GIT so perhaps I should have a look at that again. Derick Rethans 3:44 So what has changed for people that want to contribute to the PHP documentation. Can you explain a little bit the difference between before and after? Andreas Heigl 3:52 Before the documentation was moved everything was an SVN and there was generally there were two kinds of people. There were regular contributors that had an SVN account. They had the documentation on their machine. They could actually modify the documentation and just do an SVN commit, and everything was working smoothly, so that documentation was then actually built. We ran into some issues with that as well but that's a different story. Now, it is as everything is has moved to GIT, the sources are now on git.php.net. Before I come to that, what was it for people that did not have an SVN account. There was an awesome piece of technology on a web server called edit.php.net, which was a graphical user interface to the PHP documentation and to the translation so everyone could more or less log in there, with an anonymous account for example, modify the documentation, and create yeah well a kind of a pull request Create a patch that was then reviewed and could then be merged by people with SVN access. That awesome piece of technology was an awesome piece of technology when it was created some 12 years ago or something like that. It has not changed much in between. So it was still kind of working, not always, it was offline for some time. And the people that had access to the SVN were not really that responsive at all times so it could take some while for your patch to actually be merged in. So how is it now, now all the sources are on git.php.net, and there is a mirror for all translations, on https://github.com/php/, for example, doc-en for the English documentation. You have the repository, and you can create a pull request there. So you just move there, edit the file you want to move, create a pull request. And then the pull request will be merged into the main sources, again, by people with merge access. As this is a pull request that usually happens, a bit faster than before. We are still not at that point where we can create a GitHub action for that so that perhaps that can be completely automated after some technical things are resolved, like yes that does build, and we don't have any issues, technical issues with that. We could just build that automatically and merge that automatically into to the GIT sources. Those are possibilities that we now have, and from that point of view, now the contribution is much easier as we are using the technology that every one of us knows already. Derick Rethans 6:34 The original translation to work with sub modules in SVN, now how is with the GIT approach? Andreas Heigl 6:40 It actually didn't work with sub modules in SVN, it worked with, actually, one folder for the documentation, and then sub folders for the different translations. So there was a base folder PHP Doc, and within that base folder PHP Doc, there was a folder EN for the English base, kind of, and then DE for the German translation or PT_BR for the Brazilian Portuguese translation, or IT for the Italian translation. Or some other rather outdated translations that we actually didn't move over, so we only moved everything that was touched within the last two years, which brought some interesting bugs. Of course we actually moved something that was touched within the last two years, but that is not considered an active translation, and that caused some havoc. Derick Rethans 7:30 Which translation was that? Andreas Heigl 7:31 That was the Italian translation. So actually, there is no official Italian translation of the PHP documentation. But there is an Italian translation that is actually worked on, and hopefully at one point in time, we can actually promote that to a valid active translation, so that Italian people can actually see some Italian documentation for them. In GIT on the other hand, we have different repos, different repositories for the different languages. It is now, not possible to just check out phpdoc, and have every translation. Now you actually have to say, I want to check out the English documentation, and I want to check out the Italian documentation, and I want to check out the Japanese documentation, because I want to work on each of them. That has some disadvantages, especially for people that are working on multiple documentations or multiple translations. On the other hand, that also has advantages because you don't need to actually download all the translations that you're not at all interested in. Derick Rethans 8:35 But that shouldn't be something new because I'm pretty sure that I've never checked out all the translations, even with SVN. Andreas Heigl 8:41 SVN you could decide to only check out certain translations but if you check out the PHP doc base folder you would get all the documentation. Yes, there were some sub modules that actually did exactly that, like, if you check out the sub module for Italian for example you would get the English base and the Italian translation. That was all. Derick Rethans 9:04 I remember we employed some SVN magic to do that kind of things, but I forgot the most about that because it's so long ago, Andreas Heigl 9:11 Not really to worry about. Derick Rethans 9:14 No. Andreas Heigl 9:14 We're thinking about doing this similar thing for GIT now for the by using GIT sub modules, but we have not yet implemented that, because there were other pressing issues like getting the ref check documentation up and running, where you can actually see which files are outdated which files need to be translated and stuff like that. So that was more more pressing some other people have done, also work on that need to check what the current status is, to be honest, because I didn't check that. That was going on very strongly in January, after we moved between Christmas and New Year's Eve. After we equalized some glitches that happened during the whole process, because of Yeah, sometimes also processes that were nowhere really documented, and I got just got plain wrong. So then I had to invest some time and fix all that, but luckily that was during my holiday time so I had a lot of time for that so that was not an issue. Derick Rethans 10:15 So working on the documentation during your holiday's huh? Andreas Heigl 10:18 Yes, definitely. Derick Rethans 10:20 That makes it different from travelling to visit family, because that is of course not something we could do to here. Andreas Heigl 10:25 Exactly, though. Luckily, having a family at home was quite okay it was a nice change to actually be able to get away sometime, from seeing the same people over and over again. Derick Rethans 10:38 Now the documentation has moved from SVN to GIT, and everybody can now finally forget all their SVN commands. But the documentation itself is still written in Docbook. XML as far as I understand. Are there any discussions going on about changing that to something, perhaps a bit more modern? Andreas Heigl 10:58 Yes, there were a lot of discussions going on during the whole phase. I deliberately try to calm that down to not to too many things at once. The thing that I wanted to get going was moving the documentation from SVN to GIT. Just change the underlying source code repository, and not change anything else in the process, because that was already hard enough. Now that we have moved, it is easily possible to actually modify or move the documentation to some other toolage, whether that is markdown or ASCII doc or whatever. I don't care to be honest because that's someone else's job to do. In my opinion, Docbook is actually a pretty good format for that. Yes it is rather verbose for sure, but it allows you to create a lot of different documentations, because the HTML is not the only documentation that is created, there is also the possibility to create a PDF documentation or a CHM for Windows documentation, and stuff like that. I'm not 100% sure how that would work with a rather, with something like like markdown or ASCII doc or something like that. Derick Rethans 12:12 There's different strengths in different formats. Markdown for example doesn't really allow you to link in between documents, so that's probably not very handy but there's like Pandoc, which is stuff that the Python project uses. It's all pretty much designed around restructured text and linking in between them and stuff like that, so I guess that could be a way forward. It is certainly a lot easier to use than Docbook XML, but of course Docbook XML was created for this kind of rich marker without laying things out kind of situations right. Andreas Heigl 12:44 Yeah. The nice thing is actually that in with Docbook. What perhaps that is possible with other tools as well but in Docbook for example, you have one single file where all the links are located in, and every translation just links to this one file, so if a link changes for whatever reason, you can just modify that in one place and don't have to go through all the documentation. And, of course, leave half of the links unchanged, and broken, whatever. So there is a lot of stuff actually that is pretty cool. But as I said that's now up for discussion. If there is someone that actually wants to tackle that and move the documentation format to something else that is a different story. Go ahead, propose that to the internals mailing list, or to the documentation team, we'll see how it goes from there. The documentation itself, the source code itself, is hosted on a platform that we all understand, at least by now a bit better than SVN. Derick Rethans 13:42 Even though it started out on CVS, just like PHP that's. Andreas Heigl 13:46 Yes. Derick Rethans 13:47 A long long time ago. Andreas Heigl 13:49 I found the remaining stuff from the transfer from S from CVS to SVN, yes, Derick Rethans 13:55 I am sure there's still some commits lurking somewhere for that. Andreas Heigl 13:59 Oh yes, especially in the in the ref check, there is a lot of commented out code with yeah CV, in CVS we did it this way now we have to modify that for SVN. Yes, I'm pretty sure now we have some commits in there that modify the SVN stuff for making it usable with GIT. Derick Rethans 14:16 Andreas, do you have anything else to add? Andreas Heigl 14:19 In all it was an awesome experience for me. I got to know a lot of people, a lot of awesome people. That was really really insightful, and I'm really happy that I had the chance to do that, and the trust of the community was really amazing. If anyone wants to get into that stuff, kind of stuff, pick yourself something that the community needs, and go for it, and don't let yourself be derailed by unresponsive mailing lists, or just things not happening. It's not because people think you are stupid, or the task is stupid, it's just because everything is done by volunteers that just pays its price. Derick Rethans 15:00 It certainly does. Thank you, Andreas for taking the time today to talk to me about moving to PHP documentation to GIT. Andreas Heigl 15:07 Thank you very much. It was a pleasure to be here. Derick Rethans 15:13 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes Episode #28: Moving PHP Documentation to GIT Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 77: fsync: Buffers All The Way Down
PHP Internals News: Episode 77: fsync: Buffers All The Way Down London, UK Thursday, February 25th 2021, 09:05 GMT In this episode of "PHP Internals News" I chat with David Gebler (GitHub) about his suggestion to add the fsync() function to PHP, as well as file and output buffers. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:13 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 77. In this episode I'm talking with David Gebler about an RFC that he's written to add a new function to PHP called fsync. David, would you please introduce yourself? David Gebler 0:35 Hi, I'm David. I've worked with PHP professionally among other languages as a developer of websites and back end services. I've been doing that for about 15 years now. I'm a new contributor to PHP core, fsync is my first RFC. Derick Rethans 0:48 What is the reason why you want to introduce fsync into the PHP language? David Gebler 0:52 It's an interesting question. I suppose in one sense, I've always felt that the absence of fsync and some interface to fsync is provided by most other high level languages, has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP's core getting to learn the source code, and it's a very small contribution, but it's one that I feel is potentially useful, and it was easy for me to do as a learning exercise. Derick Rethans 1:16 How did you find learning about PHP's internals? David Gebler 1:19 Quite the roller coaster. The PHP internals are very arcane I suppose I would say, it's it's something that's not particularly well documented. It's quite an interesting challenge to get into it. I think a lot of it you have to pick up from digging through the source code, looking at what's already been done, putting together the pieces, but there is a really great community on the internals list, and indeed elsewhere online, and I found a lot of people very helpful in answering questions and again giving feedback when I first opened my initial proof of concept PR Derick Rethans 1:48 Did you manage to find room 11 on Stack Overflow chat as well? David Gebler 1:52 I did not, no. Derick Rethans 1:53 I'll make sure to add a link in the show notes and it's where many of the PHP core contributors hang out quite a bit. David Gebler 2:00 Sounds good to know for the future. Derick Rethans 2:02 I read the RFC earlier today. And it talks about fsync, but it also talks about flush, or f-flush. What is the difference between them and what does fsync actually do? David Gebler 2:14 That's the question that will be on everyone's lips when they hear about this feature being introduced into the language, hopefully. What does fsync do and what does fflush do? To understand that we have to understand the concept of the different types of buffering, an application runs on a system. So we have the application or sometimes called the user space buffer, and we have the operating system kernel space buffer, Derick Rethans 2:36 And we're talking about writing to file here right? David Gebler 2:39 We're talking about writing to files with fflush and fsync yep. So fflush and fsync are both about getting data out to a file, and using it to push something out of the buffers. But there are two different types of buffers; we have the application buffers; we have the operating system buffers. What fflush does is it instructs PHP to flush its own internal buffers of data that's waiting to be written into a file out to the operating system. What it doesn't do is give us any guarantee that the operating system will actually write that data to disk, so the operating system has its own buffer as well. Computers like to be efficient, operating systems like to be efficient, they will save up disk writes and do them in the order they feel like at the time they feel like. What fsync does is it also instructs the operating system to flush its buffers out to disk, thus giving us some kind of better assurance that our data has actually reached disk by the point that function returns. Derick Rethans 3:29 I really only know about Linux here, but I know on Linux that there's a journal in the file system or, in most of the file systems that it uses. Would have seen make sure the data ends up in a journal, are also committed as how the file system does it itself? David Gebler 3:45 The manner in which fsync synchronizes is indeed dependent on the particular POSIX type operating system, and file system that it's running on. I think you're right in respect of modern Linux and ext4, that would ensure the journal was also updated and that the data was persisted to disk. Older versions may behave a little bit differently. And one thing to note with fsync is that you can't take it as a cast iron guarantee that the data has either basically been persisted to disk, or that the corresponding file system entries have been updated. It is the best assurance you can get that those things have happened, and by the point that function returns, again, in PHP implementation, you have a solid an assurance as you're going to get. Because that manner of synchronization is dependent on the underlying system. It can vary. Yeah Linux ext4 is probably your best bet with fsync, but one interesting thing to know if it does go into PHP and people start using it, is that when you call fsync on a file, if you want to ensure that the file system entries are properly updated as well, you should also call fsync or a handle to that file's containing directory. And of course on a Linux or POSIX type system you can do that you can you can fopen a directory, and you can then call a sync on that handle. Derick Rethans 5:02 You mentioned that you can call fsync on a file handle. How's it different than calling just fsync without an argument? Is fsync without an argument just telling the operating system to flush its current buffers, and fsync on a file handle specifically to flush that specific file, and its metadata? David Gebler 5:22 So in the PHP implementation of fsync, you must supply it with a stream resource that is plain File Stream. You can obviously at the PHP level give it some other kind of resource but it will return a warning if it's not able to convert that into a plain File Stream. Under the hood, when you're talking about C, underlying operating system level, fsync operates on a file descriptor so it doesn't even operate on a stream handle; it operates on the actual underlying number that represents the file at the operating system level. Derick Rethans 5:51 So that is different from the Unix shell command called fsync. David Gebler 5:55 It is different. Yep. Derick Rethans 5:57 The RFC also talks about another function called fdatasync. How's that different from fsync itself? David Gebler 6:03 I think that was a really good question because it's got two answers. It's got the theoretical answer and the practical answer. The practical answer is that these days, it most likely makes no difference at all. The theoretical answer is the fdatasync only pushed out the actual file data to a disk write, but it wouldn't necessarily update metadata about the file, such as the time it was last modified the data that might be stored in the file system about the file. The reality now is that most operating systems, we'll update both of those things at once. The reason they were separated out in the POSIX specification is because of course updating the metadata is another write, so a call to fsync could require two rights were call to fdatasync would only require one. I'm not aware of any operating system and file system now that I'm going to use that actually still treats them differently Derick Rethans 6:57 In PHP, we try to make sure that functions that are implemented work exactly the same on operating systems, and you've already explained that fsync depends a little bit on how the file system handles the specific requests. Would fsync also work in a similar way on on Windows or OSX, or is it specifically meant for just Linux? David Gebler 7:20 It's not specifically meant for just Linux. fsync itself is part of the POSIX specification. Strictly speaking, fsync as an operating system level API does not exist on Windows. Windows does have a similar API mechanism called flush file buffers, and in the RFC, and in the implementation attached to that RFC, on Windows, that's what fsync does, it's a wrapped call to flush file buffers. It has, in practice the same effect. OSX is a bit of a trickier one. fsync does exist on OSX I know. I'm not a user of Apple products myself but I can tell you what I know about OSX. fsync on OSX, it will sort of attempt to flush your file buffers out, but OSX itself will not guarantee that the disk buffers are cached. So we have another layer of buffering there. You have the application space buffering, we have the kernel space buffering, and we have the hardware disk buffering itself. Physical drives will sometimes lie about having written data to disk. I mean USB drives are a notorious example of that. A USB drive will tell your operating system it's finished persisting data when it hasn't, and you can even see that on the little flashing LED on the drive, and if you pull it out your data will be corrupted. OSX is not so reliable. The interface exists, but whether it's actually worked or not is open to question because it may not have flushed the disk cache. Derick Rethans 8:38 Are there any backward compatibility issues with this RFC or the implementation of this? David Gebler 8:43 I'm pleased say there are no backwards compatibility issues at all. It's straightforwardly a new function that operates on plain file streams. It triggers a warning if you give it some kind of resource that can't be converted to a plain file descriptor - no consequence to not using it. You just get the same behaviour you've had in previous PHP versions. It's just a new optional function. Derick Rethans 9:06 What has the feedback to introducing fsync and fdatasync been so far? David Gebler 9:11 When I originally proposed the RFC, I had a bit of feedback on the internals list and around some of the other aspects of the PHP community that I reached out to in various places. Some people didn't see a need for it, which is fair enough, but my answer to that would be the when we look at the history of PHP as a web first language, you can see why people might not have had much of a use case for something like fsync. I mean it only fits particular situations, which is where you want some kind of transaction or guarantee. Quite often what people were doing with PHP were web applications where there was a degree of volatility in file rights, that wasn't important to them; they didn't particularly care about it. What I would say now is that when we look at PHP eight, and the evolution of the PHP ecosystem. You're seeing that it is, it is such a versatile and performance general purpose programming language these days, that people are using it for all manner of tasks. Using it in micro service architectures for back end services, they're using it even for things like data science and machine learning, emerging industries like that, so it's got so many more applications now. Broadly the feedback I had was, for the most part, probably wouldn't take much more than a pull request for it to be accepted. I think it's a very non controversial RFC. It's a small feature to add in the form of a new function. Derick Rethans 10:33 And that is very different than Introducing enumerations or Fibers, of course. David Gebler 10:38 Both of which I'm looking forward to. Derick Rethans 10:40 We spoke a little bit about file system buffers, once you do a write it up in the application buffers, with flush, you can flush that to the operating system buffers and fsync to the file system. But in PHP development, there's also other buffers, that if you output something to the screen on the command line that ends up directly on the terminal. When you do this when PHP runs in a web server environment it doesn't do that because there's more buffers in between right. There's PHP's own buffers, there's buffers you can configure, there is then web server buffers, and networking buffers. So it's buffers all the way down pretty much, isn't it? How does the buffering in PHP's output work, and what kind of things can you do with that? David Gebler 11:21 When we look at buffering in PHP, this is very easy to get confused, because he says buffers all the way down. On one hand we may be talking about buffers at the file system level and application space and kernel space, and on the other side we're gonna be talking about these kinds of buffers that you've just mentioned. Again, this is something that PHP does primarily for performance and perhaps for a few other purposes in how it manages the application that is running. Buffering is a way of storing output, somewhere before it gets sent on to the web server and ultimately from the web server to a user's browser. As you say a web server has its own buffering as well and PHP provides a couple of functions by which you can also attempt to force output to the browser. So again we have much as we have fflush for files. We have flush for regular output that you're trying to send to a web server. And that function will flush the internal output buffers of PHP, and it will then try and flush your web server buffers. There's an interesting parallel here because much like with fsync, flush versus fsync, you can't necessarily guarantee with fflush what the operating system will do. Your data wanted received it into its own buffers. With PHP you can't necessarily get a guarantee that a web server will flush its own buffers when you flush your output there. Perhaps we need to invent some kind of fsync for the web server as well. Output buffering is something that you can configure in PHP and you can stack and nest output buffers as well, so that means from an application developer point of view, you are using some other components, some other bit of code in your PHP system that produces its own output. When I say output, I mean via the normal mechanisms you would write something to a browser in PHP, so things like echo at the simplest level. What you can do is you can use PHP's output buffer functions, which are all the ones that start with ob then an underscore, to capture that output and control what you do with it, instead of it being output to the browser; you can capture it into a string, you can manipulate it, you can discard it, you can throw it up the chain to be output yourself. That's what we would primarily use those buffers for in PHP, but output buffering is also something you can configure in the php.ini file. You can turn it off, you can set the buffer size. Do you have a little bit of fine tuning there that you can do. Derick Rethans 13:39 Something of that just popped in my mind is that when you call the new fsync on a file resource handler, file resource in PHP are implemented with an underlying interface because streams. Is there another fall writing buffer in streams itself as well? David Gebler 13:58 There is a buffer in the actual C library level when it comes to streams. That's going into the detail a little bit of what I was talking about earlier where you have user space buffers and kernel space buffers. The reason those things exist is to do with the way an operating system manages a computer and keeps everything safe. User space isn't able to access kernel space. It's about range of memory addresses in the computer; we're getting quite low level now in terms of how all this works. In the underlying implementation in PHP source code, we're using the C File Stream functions to write the streams, and that means that the actual data gets copied into a buffer in userspace. What fsync does is it instructs the kernel to make a copy of that data in its own memory space, and then push that out to disk. It's getting quite low low level when we get into those details, it has to do with how file access is managed in PHP. There is an even lower level of access, which is more suitable for very high throughput intensive IO operations, which isn't available in PHP at all. I'm not so familiar with how this works on Linux because the file systems are different, but I can tell you in Windows, if you if you want really intensive throughput, you don't want to be calling fsync or equivalent flush file buffers, you know, hundreds of 1000s of times per second. You want to use what we call unbuffered output, where you write directly at the block level to disk. It wouldn't be suitable to try and do something like that in PHP, because it's to higher level language. It's a very very fine level of control that you need to be able to do that kind of thing. But with that level of control you also have a high level of consequence if something goes wrong. Derick Rethans 15:39 I think there's actually an extension in PECL, or there used to be one at least, I'm not sure but it's still maintained for newer PHP versions, called dio or d.i.o., standing for direct IO, which I think implements some of these features, but I don't quite remember but I thought that was file system related, or just only network related direct IO. David Gebler 16:00 I think direct IO extension did have lower level file system access off the top of my head. It's been a long time since I looked at it, and I think it actually had the option to open a file in synchronized mode, which means every write is essentially implicitly calling fsync. What it didn't have was the ability to open file writes in unbuffered mode, which is probably because that's lower level, still I mean that requires you to literally know the block size of the file system you're writing to and to write in that, in that size, Derick Rethans 16:30 Doing that from PHP goes a little bit too far, I suppose. David Gebler 16:34 It does go too far, I think. Derick Rethans 16:36 Then again, if you can open a file you can open a file device as long as you have permission. So, I guess, nothing stops you from at least on Linux opening /dev/sda whatever number it is, as long as you're running PHP as root and writes to it, but I don't think this is a wise thing to do. David Gebler 16:52 Definitely something I'd urge people to try with caution. I mean, obviously on Linux you you do have this kind of extraordinary power from the combination of the user you're running a process as, coupled with the fact that Linux treats everything as a file, and then actually has some interesting implications for fsync, you can try compiling the branch that I've submitted in the PR for the RFC; compile it on Linux, call fsync on some different handles to things which aren't actually files, but which to the underlying Linux operating system look like files. Hopefully the implementation I've provided is robust enough that it should just return false when you try and fsync things that are not actually files. Derick Rethans 17:28 What kind of things are you thinking of here, things like directories? David Gebler 17:32 Directories are fine for fsync and you should actually be able to get a successful fsync on directories because they are literally part of the file system. It's fine on POSIX type systems to do that. I'm not actually sure whether you can do that on Windows or not, but I don't think it provides any particular benefit to fsync a directory on Windows because the underlying flash file buffers API works on the file level only, but on Linux you could try opening file handles to something like a Unix socket and try and fsync that. You should just get false in PHP land from that function because it knows that it can't convert that file handle into an actual file stream, and thus can't get a regular fsync on it. Derick Rethans 18:12 Have you created a test case for that situation? David Gebler 18:15 I have created some test cases for opening streams to things that are not files from PHP. I can't whether I've covered that particular one or not, but I have got a couple in there for things which are not files at the PHP level. Derick Rethans 18:29 That's good to hear. When do you think he will be putting this up for a vote? David Gebler 18:34 I'm planning to put this up for a vote later this week actually. The RFC has been open for a little while, I think it's been about three weeks since I announced it on internals, and obviously there hasn't been a huge amount by way of feedback or discussion on it lately. That doesn't particularly surprise me, it is not a particularly controversial thing to add. I think a lot of people probably don't have much feeling about it one way or the other. But then, I'm hopeful, people who vote will see that as a reason to include it. Derick Rethans 19:00 Thank you David for explaining the fsync RFC and related topics to me today. David Gebler 19:05 Well, thanks for having me on. It's been great talking to you. And of course to anyone listening who's interested in the RFC, do check it out on the PHP wiki. Do have a look at the discussion on internals. And if you're a voting member, don't forget to vote when I open it up to vote later this week. Derick Rethans 19:21 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: fsync function Room 11 on StackOverflow PECL extension DIO Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 76: Deprecate null, and Array Unpacking
PHP Internals News: Episode 76: Deprecate null, and Array Unpacking London, UK Thursday, February 18th 2021, 09:04 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Deprecate passing null to non-nullable arguments of internal functions, and Array Unpacking with String Keys. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 76. In this episode, I'm talking with Nikita Popov about a few more RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself. Nikita Popov 0:34 Hi, I'm Nikita. I work on PHP core development on behalf of JetBrains. Derick Rethans 0:39 In the last few PHP releases PHP is handling of types with regards to internal functions and user land functions, has been getting closer and closer, especially with types now. But there's still one case where type mismatches behave differently between internal and user land functions. What is this outstanding difference? Nikita Popov 0:59 Since PHP 8.0 on the remaining difference is the handling of now. So PHP 7.0 introduced scalar types for user functions. But scalar types already existed for internal functions at that time. Unfortunately, or maybe like pragmatically, we ended up with slightly different behaviour in both cases. The difference is that user functions, don't accept null, unless you explicitly allow it using nullable type or using a null default value. So this is the case for all user types, regardless of where or how they occur as parameter types, return values, property types, and independent if it's an array type or integer type. For internal functions, there is this one exception where if you have a scalar type like Boolean, integer, float, or a string, and you're not using strict types, then these arguments also accept null values silently right now. So if you have a string argument and you pass null to it, then it will simply be converted into an empty string, or for integers into zero value. At least I assume that the reason why we're here is that the internal function behaviour existed for a long time, and the use of that behaviour was chosen to be consistent with the general behaviour of other types at the time. If you have an array type, it also doesn't accept now and just convert it to an empty array or something silly like that. So now we are left with this inconsistency. Derick Rethans 2:31 Is it also not possible for extensions to check whether null was passed, and then do a different behaviour like picking a default value? Nikita Popov 2:40 That's right, but that's a different case. The one I'm talking about is where you have a type like string, while the one you have in mind is where you effectively have a type like string or null. Derick Rethans 2:51 Okay. Nikita Popov 2:52 In that case, of course, accepting null is perfectly fine. Derick Rethans 2:56 Even though it might actually end up being different defaults. Nikita Popov 3:01 Yeah. Nowadays we would prefer to instead, actually specify a default value. Instead of using null, but using mull as a default and then assigning something else is also fine. Derick Rethans 3:13 What are you proposing to change here, or what are you trying to propose to change that into? Nikita Popov 3:18 To make the behaviour of user land and internal functions match, which means that internal functions will no longer accept null for scalar arguments. For now it's just a deprecation in PHP 8.1, and then of course later on that's going to become a type error. Derick Rethans 3:35 Have you checked, how many open source projects are going to have an issue with this? Nikita Popov 3:40 No, I haven't. Because it's not really possible to determine this using static analysis, or at least not robustly because usually null will be a runtime value. No one does this like intentionally calling strlen with a null argument, so it's like hard to detect this just through code analysis. I do think that this is actually a fairly high impact change. I remember that when PHP 7.2, I think, introduced to a warning for passing null to count(). That actually affected quite a bit of code, including things like Laravel for example. I do expect that similar things could happen here again so against have like strlen of null is pretty similar to count of null, but yeah that's why it's deprecation for now. So, it should be easy to at least see all the cases where it occurs and find out what should be fixed. Derick Rethans 4:35 What is the time frame of actually making this a type error? Nikita Popov 4:38 Unless it turns out that this has a larger impact than expected. Just going to be the next major version as usual so PHP 9. Derick Rethans 4:45 Which we expect to be about five years from now. Nikita Popov 4:49 Something like that, at least if we follow the usual cycle. Derick Rethans 4:52 Yes. Are there any other concerns for this one? Nikita Popov 4:55 No, not really. Derick Rethans 4:57 Maybe people don't realize it. Nikita Popov 4:58 Yeah, possibly. You can't predict these things, I mean like, this is going to have like way more practical impact for legacy code than the damn short tags. But for short tags, we get 200 mails and here we get not a lot. Derick Rethans 5:14 I think this low impact WordPress a lot. Nikita Popov 5:17 Possibly but at least the thing they've been complaining about is that something throws error without deprecation, and now they're getting the deprecation so everyone should be happy. Derick Rethans 5:28 Which is to be fair I think is a valid concern. Nikita Popov 5:30 Yes, it is. I've actually been thinking if we should like backport some deprecations to PHP 7.4 under an INI flag. Not like my favourite thing to work on, but people did complain? Derick Rethans 5:47 Which ones would you put in there? Nikita Popov 5:48 I think generally some cases where things went from no diagnostics to error. I think something that's mentioned this vprintf and round, and possibly the changes to comparison semantics. I did have a patch that like throws a deprecation warning, when that changes and that sort of something that could be included. Derick Rethans 6:12 I would say that if we were in January 2020 here, when these things popped up, then probably would have made sense to add these warnings and deprecations behind the flag for PHP seven four, but because we've now have done 15 releases of it, I'm not sure how useful this is now to do. Nikita Popov 6:30 I guess people are going to be upgrading for a long time still. I don't know I actually not sure about how, like distros, for example Ubuntu LTS update PHP seven four. If they actually follow the patch releases, because if they don't, then this is just going to be useless. Derick Rethans 6:48 Oh there's that. Yeah. Derick Rethans 6:50 There is one more RFC that I would like to talk to you about, which is the array unpacking with string keys RFC. That's quite a mouthful. What does the background story here? Nikita Popov 7:00 The background is that we have unpacking in calls. If you have the arguments for the call in an array, then you write the three dots, and the array is unpacked into actual arguments. Derick Rethans 7:14 I'd love to call it the splat operator. Nikita Popov 7:16 Yes, it is also lovingly called the splat operator. And I think it has a couple more names. So then, PHP 7.4 added the same support in arrays, in which case it means that you effectively merge, one array to the other one. Both call unpacking and array unpacking, at the time, we're limited to only integer keys, because in that case, are the semantics are fairly clear. We just ignore the keys, and we treat the values as a list. Now with PHP 8.0 for calls, we also support string keys and the meaning there is that the string keys are treated as parameter names. That's how you can like do a dynamic named parameter call. Actually, this probably was one of the larger backwards compatibility breaks in PHP eight. Not for unpacking but for call_user_func_arg, where people expected the keys to be thrown away, and suddenly they had a meaning, but that's just a side note. Derick Rethans 8:21 It broke some of my code. Nikita Popov 8:23 Now what this RFC is about is to do same change for array unpacking. So allow you to also use string keys. This is where originally, there was a bit of disagreement about semantics, because there are multiple ways in which you can merge arrays in PHP, because PHP has this weird hybrid structure where arrays are a mix between dictionaries and lists, and you're never quite sure how you should interpret them. Derick Rethans 8:54 It's a difference between array_merge and plus, but which way around, I can ever remember either. Nikita Popov 9:00 What array_merge does is for integer keys, it ignores the keys and just appends the elements and for string keys, it overwrites the string keys. So if you have the same string key one time earlier and again later than it takes the later one. Plus always preserves keys, before integer keys. It doesn't just ignore them, but also uses overriding semantics. The same is the other way round. If you have something in the first array, a key in the first array and the key in the second array, then we take the one from the first array, which I personally find fairly confusing and unintuitive, so for example the common use case for using plus is having an array with some defaults, in which case you have to, like, add or plus the default as the second operand, otherwise you're going to overwrite keys that are set with the defaults which you don't want. I don't know why PHP chose this order, probably there is some kind of idea behind it. Derick Rethans 10:01 It's behaviour that's been there for 20 plus years that might just have organically grown into what it is. Nikita Popov 10:07 I would hope that 20 years ago at least someone thought about this. But okay, it is what it is. So ultimately choice for the unpacking with string keys is between using the array_merge behaviour, the behaviour of the plus operator, and the third option is to just always ignore the keys and always just append the values. And some people actually argue that we should do the last one, because we already have array_merge and plus for the other behaviours. So this one should implement the one behaviour that we don't support yet. Derick Rethans 10:40 But that would mean throwing away keys. Nikita Popov 10:43 Yes. Just like we already throw away integer keys, so it's like not completely out there. So yeah, that is not the popular option, I mean if you want to throw away keys can just call array_values and go that way. So in the end, the semantics it uses is array_merge Derick Rethans 10:58 The array_merge semantics are.. Nikita Popov 11:01 append, like ignore integer keys just append, and for string keys, use the last occurrence of the key. Derick Rethans 11:07 So it overwrites. Nikita Popov 11:08 It overwrites, exactly. Which is actually also the semantics you get if you just write out an array literal where the same key occurs multiple times. Unpacking is like kind of a programmatic way to write a function call or an array literal, so it makes sense that the semantics are consistent. Derick Rethans 11:26 I think I agree with that actually, yes. Are there any changes that could break existing code here? Nikita Popov 11:32 Not really because right now we're throwing an exception if you have string keys in array unpacking. So it could only break if you're like explicitly catching that exception and doing something with it, which is not something where we provide any guarantees I think. So generally I think that, removing an exception doesn't count as a backwards compatibility break. Derick Rethans 11:55 I think that's right. Do you have anything else to add here? Nikita Popov 11:59 No, I think that's a simple proposal. Derick Rethans 12:02 Thank you, Nikita for taking the time to explain these several RFCs to me today. Nikita Popov 12:07 Thanks for having me Derick. Derick Rethans 12:11 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Array unpacking with string keys RFC: Deprecate passing null to non-nullable arguments of internal functions Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 75: Globals, and Phasing Out Serializable
PHP Internals News: Episode 75: Globals, and Phasing Out Serializable London, UK Thursday, February 11th 2021, 09:03 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Restrict Globals Usage, and Phase Out Serializable. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 75. In this episode, I'm talking with Nikita Popov about a few RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself? Nikita Popov 0:34 Hi, I'm Nikita, I work at JetBrains on PHP core development and as such I get to occasionally, write PHP proposals RFCs and then talk with Derick about them. Derick Rethans 0:47 The main idea behind you working on RFCs is that PHP gets new features not, you end up talking to me. Nikita Popov 0:53 I mean that's a side benefit, Derick Rethans 0:55 In any case we have a few to go this time. The first RFC is titled phasing out Serializable, it's a fairly small RFC. What is it about? Nikita Popov 1:04 That finishes up a bit of work from PHP 7.4, where we introduced a new serialization mechanism, actually the third one, we have. So we have a bit too many of them, and this removes the most problematic one. Derick Rethans 1:19 Which three Serializable methods or ways of doing things currently exist? Nikita Popov 1:24 The first one, which doesn't really count is just what you get if you don't do anything, so just all the Object Properties get serialized, and also unserialized, and then we have a number of hooks, you can use to modify that. The first pair is sleep and wake up. Sleep specifies which properties you want to serialize so you can filter out some of them, and wake up allows you to run some code, after unserialization, so you can do some kind of fix up afterwards. Derick Rethans 1:52 From what I remember, if you use unserialize, where does the wake up the constructor doesn't get called? Nikita Popov 1:59 During unserialization the constructor, never gets called. Derick Rethans 2:03 So wake up a sort of the static factory methods to re rehydrate the objects. Nikita Popov 2:08 Exactly. Derick Rethans 2:08 So that's number one, Nikita Popov 2:10 Then number two is the Serializable interface, which gives you more control. Namely, you have to actually like return the serialized representation of your object. How it looks like is completely unspecified, you could return whatever you want, though, in practice, what people actually do is to recursively call serialize. And then on the other side when unserializing you usually do the same so you call unserialize on the stream you receive, and then populate your properties based on that. The problem with this mechanism is exactly this recursive serialization call, because it has to share state, with the main serialization. And the reason for that is that, well PHP has objects, or object identity. So if you use the same object in two places you really want it to be the same object and not two objects with the same content. Serializable has to be able to preserve that, and that requires that it runs in the middle of the unserialization. Derick Rethans 3:14 Not sure if I follow that bit. Nikita Popov 3:16 Well maybe it's not a hard requirement more like an issue with our serialization format that comes into play here. Way PHP implements this, is using back references. So at first unserializes an object and then later you can have like a pointer back to it, that says like, I want to use the same object as at position number, 10, or so. For these back references to work, we have to actually execute the serialization handler while unserializing because otherwise the offsets will no longer match. So we can just run this at the end of unserialization for example because then our offsets would be incorrect. And this is a big problem because it's not really safe to run code, during unserialization because things are partially initialized. To make these back references work, PHP has to actually store pointers to these objects. And if you somehow modify things in specific ways, then these pointers become invalid. They point to a memory that no longer exists, and a possibly exploitable crash. This is why we would like to get rid of this mechanism. Derick Rethans 4:25 But of course, in order to get rid of things, we had to have a better way of doing things in place first, right, which came with PHP seven four. Nikita Popov 4:32 That's right. Derick Rethans 4:32 So that's number three. Nikita Popov 4:34 That's number three. Number three is actually very similar to number one: two new magic methods, double underscore serialize and double underscore unserialize. Serialize returns an array, usually like an array of properties for example, and then unserialize populates the object from that array. In practice, this works very similar to the Serializable interface, just that you don't manually call serialize and unserialize, but PHP will do so on your behalf. So you just return an array or get an array, and PHP will integrate that into the like main serialization, and because it's left to PHP, PHP can control where these calls occur. Derick Rethans 5:19 With sleep originally you only return the name of the properties. Whereas with this new interface you return the names of the properties but also their values. Nikita Popov 5:30 That's right. The new mechanism, this, like, in practice, it serves as a replacement for the Serializable interface. But from a technical side it's really close to sleep and wake up, um, just that, as you said, instead of returning property names you return both names and values. Derick Rethans 5:51 And this is now the recommended way of doing serialization. Nikita Popov 5:54 Like the motivation is one problem was, what I mentioned the security problem. Maybe the thing that impacts users more commonly is that things like calling parent::serialize and parent::unserialize with the Serializable interface, usually doesn't do what you want. Again, due to these back references because, like, the calls get out of order, we should do the same thing with the magic methods, with the underscore underscore serialize and unserialize and you can safely call parent methods and compose serialization in that way. Derick Rethans 6:29 That's our state of serialization right now. We haven't spoken about RFC, what are you proposing to do here? Nikita Popov 6:34 The RFC proposes to get rid of the Serializable interface. And, like in a way that is a bit more graceful than just deprecating it outright. And the idea is that if you have code that is still compatible with PHP 7.3, where the new mechanism doesn't exist, you probably still want to use Serializable. So if we just deprecated out right that would be fairly annoying to have code that's compatible with PHP 7.3, and 8.1. So instead what we do is we only deprecate the case where you implement Serializable without implementing the new mechanism. If you implement both of them, then you're fine for now. Derick Rethans 7:15 The new mechanism, the one we're introducing PHP 7.4, would overrides the PHP 7.3 one already anyway. Nikita Popov 7:22 Exactly. So on PHP 7.3 you would end up using Serializable and PHP seven four and higher, you would be using the new mechanism. And then, at a later point in time we would actually also deprecate Serializable itself and then remove it, though, like based on mailing list response, some people at least didn't like the long timeline. I'm not exactly sure what the alternative is, so either to deprecate Serializable right away, or to later remove it without deprecation of the interface itself. Derick Rethans 7:57 Yeah, from what I saw the, the long-term-ness of phasing it out. I think had mentioned that it finally got removed in PHP 10, which is potentially 10 years away right. If we following every five years with a new major release. But then in the end, it does have some merit making sure that people can move on without being left in the dark at some point right. What is your own preference? Nikita Popov 8:22 My own preference is what I proposed. I would also be fine with, like say in PHP 8.1, we call the proposal so you only get a warning if you only implement Serializable without the new mechanism, and the PHP nine we could just drop Serializable entirely. I think that would not be, because then the only problem then would be if you have code that is competitive with PHP 7.3 and PHP 9.0. I am sure that code will exist ... pretty normal version range to have. Derick Rethans 9:08 Yeah, I probably would agree with you there. When I read the RFC it also mentioned PDO. Why would it mention PDO? Nikita Popov 9:15 This all is something I only found out while writing it's on there is a PDO fetch serialize flag, which automatically calls unserialize when fetching values. So I will not comment on the really dubious idea of storing serialized data in the database. Derick Rethans 9:35 I mean, people would currently said that the alternative is to store JSON, in these columns as values. Nikita Popov 9:40 That would still be better. Derick Rethans 9:42 But it's still a serialized format? Nikita Popov 9:44 But at least the way this flag is implemented is effectively broken, because it doesn't just call unserialize, the function; it calls unserialize on the Serializable interface. I have no idea how this was intended to be used in practice, because it's not compatible with, like the normal serialization of the class. In practise like everything I have found about this online is basically just that okay if this functionality is broken, you shouldn't use it. Derick Rethans 10:15 So you have less concerns just removing that straight away, I suppose. Nikita Popov 10:19 Yeah. Derick Rethans 10:20 Do you have anything else out about serialization. Nikita Popov 10:22 I think this proposal is a very simple one and we have actually talked, way too much about this. Derick Rethans 10:29 Let's move on to the next RFC, which is titled Restrict Globals Usage. This title almost sounds worse than it is as it might imply that you want to get rid of the globals array altogether. But I bet that's not the case. And I also suspect that restricting the globals array is a lot more technical as a subject as it might seem. Nikita Popov 10:49 That's right. So this is really, mostly motivated by internal concerns, and has hopefully not a great deal of impact on like practical usage. There are a couple motivations, so some of them are about semantics, so globals is a very magic variable, that does not follow the usual semantics of PHP a number of ways. In particular array are typically by value. In all other cases, they are by value, which means that if you modify, like if you copy an array and modify one copy, then the other one doesn't get modified, I mean it's a copy so obviously it doesn't get modified. For globals if that's not the case. If you make a copy of globals and you modify the copy, then the original array also gets modified. Derick Rethans 11:36 Which is not the case for other super globals such as underscore get and underscore post. Nikita Popov 11:41 The other super globals are a bit magic but not that magic. There are a couple of other concerns with edge cases, but I think the real motivation here is the internal concern. And that's how globals is implemented. PHP, normally, manages variables in functions and scripts, using so called compiled variables. And this works by well when the script is compiled we actually see all the variables with the used, at least all the variables that don't go through something like variable variables or globals or something like that. And we reserve a slot for each of these variables, so we can directly access it. We don't have to look up, like the variable by name, we just say this is variable number seven and we can directly access it, which is much much more efficient. The problem is, then if you have something that globals you want to both have this access by index, and access by name, and they do that by storing a pointer inside the globals array to the actual location of the variable. Yeah, so this is a very special concept. So we call this an indirect, a variable of indirect type, and it essentially occurs only inside the globals array, and for object properties. For object properties it happens for the same reason, so object properties are normally accessed by index, but if you do something like variable object dynamic object access, then we also have to look it up by name. There we do the same thing, so we have a like map from property names to values, and if the value is really stored inside an object property slot then we just store a pointer there. The thing with the objects is that this is like really an internal concern that's well encapsulated and doesn't leak into normal PHP code. That's not the case with globals because globals is on the surface just a normal array. So you can do everything with it, you do with a normal array you can pass it to functions. Like in theory, all the functions, need to deal with this special value type that says: okay actually this is not the value itself is just a pointer to the value. The way you do it is every time you access a value you check okay is this an indirect value; if it is, follow the pointer. Derick Rethans 14:01 I have plenty of code in Xdebug for this. Nikita Popov 14:04 So it's really a super simple operation to do, but you actually have to do it. And you have to do it absolutely everywhere, if you're being pedantic. In practice that just doesn't happen. In PHP's own code, in the standard library, the array functions are those do consistently handle this edge case. But if you like go further, even most bundled extensions, and certainly most third party extensions, they are not going to do this and if they don't either they just get some, like you know benign misbehaviour where it looks like array elements are missing, or you get a crash, because the type is simply not handled. Yeah, well that's not a great state to be in, because like pushing passing the globals array into something like array pop or something, is very weird operation to do. I don't know if ever, anyone has done that for purposes outside testing PHP. But to support it, we have to like handle this special case everywhere, which is not robust and also has a certain performance impact when it comes to low level operations. So we also have to do this check every time you access an array for example from normal PHP code The idea is to remove the special case. That's the motivation here. Derick Rethans 15:23 What are you proposing to change? Nikita Popov 15:26 One is if you just access variable in globals. So you write $GLOBALS[], some variable name. Then we treat that especially and compile it down to an access to this global variable. So it could be a read access, could be a write access, or anything else, Derick Rethans 15:44 But it is something that happens, when PHP compiles scripts. Nikita Popov 15:48 That's right. The second part is you can also access the globals array in a read-only way, so you can take the whole array, and for example, do a for each loop over it. And that continues to work. The part that doesn't work is to take the whole globals array and modify it in some way, for example, passing globals to array pop, which requires passing it by reference is going to throw an error. Derick Rethans 16:13 At which state. Is that going to throw an error? Nikita Popov 16:15 That's usually during compilation, but specifically for the case of by-reference passing it can't be detected at runtime, because we don't always know if it's a by-reference or by-value pass. But for most of the cases it's a compile time error. Maybe one particular case that's worth mentioning is that you also can do a foreach by-reference over it. So if you like want to loop over globals and modify entries while doing so the way to do it now would be to do by-value loop and then just again access specific elements in it, like access globals key or something. And the reason why this helps us is that we can just return, like when you access globals, we can actually return a copy of the array. We don't have to maintain these like indirect pointers which are only necessary to support modifications, we can just return a copy. That means we no longer have to deal with this edge case in most places, in the engine and in third party extensions, Derick Rethans 17:15 Talking about third party extensions, the code that implements this RFC has already been merged into PHP eight one, but the moment you did that, tests in Xdebug started failing, because I read the globals array, but it doesn't seem like it exists any more now. Nikita Popov 17:31 That's actually a good point. Globals, I would know view it as a like, more like a syntax construct, similar to variable variables, or even the $this variable. So this is also not a real variable. Globals is no longer added as an actual variable in the symbol table, which is directly compiled down to either an access to the specific global or returns a copy of the table. So for Xdebug you, I probably filter you you have to access the EG symbol table. Derick Rethans 18:02 Yes, but it wasn't as simple as it seemed because this is a hash table, and no longer is that a full array, which means that all my logic code doesn't work with that. So I've decided that globals just no longer exists and stuff, which is what it logically is in PHP eight one anyway. Nikita Popov 18:22 So that might actually be nice. So I know that, like code that does work with globals, like as an array, usually also always skips skips globals itself when iterating over it, because otherwise you usually run into some kind of infinite recursion issue. That's actually another thing, so globals is the one way you can have a recursive array, without references being involved. So I know that the Symfony like variable/cloner dumper. That goes for a lot of effort to detect cycles, like has some extra fun hacks to detect globals correctly for that reason, because usually you just take references but for globals that doesn't work. Derick Rethans 19:09 Right, how much of an impact is this going to have to existing code? Nikita Popov 19:12 So I like analysed the top composer packages and found, not a lot of usages. I don't remember the exact number, it was maybe five cases that break. That's not to say that it has no impact. I do know that PHPUnit eight point whatever, had such a globals use, which was fixed already because Sebastian Bergmann now, adds support for new PHP versions to PHPUnit eight and nine both. If you're using PHPUnit seven, then probably, it's no longer going to work for that reason. Of course, it also doesn't work for many other reasons, as well. Depending on which features to use, but I do know that you know sometimes if you're not using mocks, then you can often use old PHPUnit versions, but I think that's no longer going to work in this case. Derick Rethans 20:04 It's something that users of PHP and PHPUnit, probably should start testing once the alpha and beta releases of PHP eight one start happening. Nikita Popov 20:16 Right. I mean, I hope that it's not going to be a big issue. After all, this is minor PHP version. So we really shouldn't be introducing bad breaks, but at least the usage I've seen in open source project suggests that it should not be a big problem. Derick Rethans 20:33 Excellent. As I've mentioned this RFC is already been merged. So I don't really have to ask about feedback, because it's irrelevant right now. It's already there. Nikita Popov 20:44 Well, you could still have feedback afterwards. Derick Rethans 20:48 Thank you, Nikita for taking the time to explain these several RFCs to me today. Nikita Popov 20:52 Thanks for having me Derick. Derick Rethans 20:57 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Restrict Globals Usage RFC: Phase Out Serializable Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 74: Fibers
PHP Internals News: Episode 74: Fibers London, UK Thursday, February 4th 2021, 09:02 GMT In this episode of "PHP Internals News" I talk with Aaron Piotrowski (Twitter, Website, GitHub) about an RFC that he is proposing to add Fibers to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 74. Today I'm talking with Aaron Piotrowski about a Fiber RFC, that he's working on together with Nicolas Keller. Aaron, would you please introduce yourself. Aaron Piotrowski 0:33 Hi everyone I'm Aaron Piotrowski, I started programming with PHP back in 1998 with PHP three, so I've just dated myself there but, but I've worked with a lot of different languages over the last few decades but PHP is always continually remaining, one of my favourite and I'm always drawn back to it. I've gotten a lot more involved with the PHP projects since PHP seven. The Fiber RFC is my first major contribution I have attempted though. In the past I did the RFC for the throwable exception hierarchy. And the Iterable pseudo type in PHP 7.1. Derick Rethans 1:12 Yeah, these things are both before I started doing the podcast so hence we haven't spoken yet, at least on here. We've actually met at some point in the past. I've had a read through the Fiber RFC this morning, but I'm still fairly baffled. Could you perhaps explain in short what Fibers are where the idea comes from. And what's your specific interest is in adding them to PHP? Aaron Piotrowski 1:35 A few other languages already have Fibers like Ruby, and they're sort of similar to threads in that they contain a separate call stack, and a separate memory stack, but they differ from threads in that they exist only within a single process, and that they have to be switched to cooperatively by that process rather than actively by the OS like threads. So sometimes they're called Green threads, and the generators that are in PHP already are actually somewhat similar to Fibers; but generators differ in that they're stack less. And so what that means is that generator function can only be interrupted at one layer. Whereas a Fiber can be interrupted anywhere in the call stack. So like it'd be imagine if you had a generator where yield could be very deep in a function call. Rather than at the top level. Like, how generators can be used to make interruptible functions, Fibers can also be used to create similarly interruptible functions, but with again without having to know exactly when it's going to be interrupted not at the top level but at any point in the call stack. And so the main motivation behind wanting to add this feature is to make asynchronous programming in PHP much easier and eliminate the distinction that usually exists between async code that has used promises and synchronous code that we're all used to. Derick Rethans 3:09 So what specifically are you proposing to ask to PHP here then? Aaron Piotrowski 3:12 Specifically I'm looking at adding a low level Fiber API, that's really aimed specifically at async framework authors to create their own more opinionated API's on top of that low level API. So adds just a couple of classes: Fiber, and a FiberScheduler on within a couple of exception classes and reflection classes for inspecting Fibers. When a Fiber is suspended to the execution switch is to FiberScheduler, which is then a special Fiber, that's able to start and resume, regular user Fibers. So a Fiber scheduler is generally going to be something like an event loop that then, when a Fiber is suspended that our scheduler event loop will resume certain Fibers, like in response to events like data becoming available on a socket or like a timer expiring. Derick Rethans 4:17 How would the event loop, decide which Fiber to resume, depending on on input, for example? Aaron Piotrowski 4:24 It's largely up to, how like that framework choose to write that event loop, but in general, like when a Fiber is going to suspend it'll set up some sort of callback, or add it to like an array of Fibers that's waiting on events, and when execution switches to the event loop. Derick Rethans 4:44 A Fiber before it suspends itself set up add another Fiber to the scheduler. Aaron Piotrowski 4:48 That's exactly right. Before a Fiber suspends itself it adds itself to some sort of event in the event loop that when it triggers, it will resume that Fiber. So, if you're familiar with how some of the other async frameworks that work now they'll add something like a callback or promise to the event loop that's resolved. This is sort of working the same way except that it's just resuming a Fiber that of like invoke a callback, although you know that Fiber might be resumed and by invoking a callback. Derick Rethans 5:23 Is the Fiber scheduler or the event loop as however if you want to call it, is that something you would, or can also make use of in a normal PHP applications, I mean by normal I mean, not like an async PHP framework? Is that the intention is all? Aaron Piotrowski 5:39 Using Fibers than kind of helps you eliminate that boundary that that exists, trying to put asynchronous code into something using synchronous code because you end up with a promise that you have to await in that doesn't really work very well, where you're trying to mix it with sync code, Fibers eliminate the need for a promise. Since, the asynchronous function can still return types, you can mix async code into a traditional like sync application, even like something running in Nginx or Apache, it doesn't have to be a fully asynchronous app to make use now of some async I/O. Derick Rethans 6:23 For example if he wants to do multiple database calls at the same time. Will you be able to use a uses for that? Aaron Piotrowski 6:30 Exactly. Derick Rethans 6:31 If a user would have a PHP application and want to use multiple database calls at the same time, how would, how would they set it up it with Fibers and Fibers scheduler? Aaron Piotrowski 6:40 This is a low level API is aimed primarily at like a framework author. Generally if you're writing application with it, you're probably going to use one of those frameworks so it would largely depends on how that framework would set that up. Although, in general, those frameworks are going to provide some sort of abstraction for running code concurrently, that they probably have their own sort of placeholder object like like a promise again. So that when you start running things concurrently, they return to something that you can then wait on for all those things to end up, or when those things, complete executing. So it doesn't totally eliminate the need for promises, but it does allow for both to do not always that async to not always have to return a promise rather a promise is only required when you want concurrency, and that, you know, a framework will provide tools to await that can still be mixed in with synchronous code. Derick Rethans 7:48 Do I understand this correctly that you won't need to promise unless you mix it with synchronous code? Aaron Piotrowski 7:54 You won't need a promise unless you explicitly need concurrency. Derick Rethans 7:57 Okay, that makes more sense I suppose. Aaron Piotrowski 7:59 It's difficult to explain it's so much easier with examples. Derick Rethans 8:03 Yeah but examples are very difficult to do in audio only. Aaron Piotrowski 8:07 Yes, exactly. You have like a database query that returns a result. If you want to run multiple queries at the same time, the async library that uses Fibers underneath would be able to provide an abstraction that would allow you to run multiple queries at once. But that those two run concurrently would return a promise. But you would be able to collect those promises together, and use like a await function, provided by that async framework to then get the results of all of the queries at once. Derick Rethans 8:47 You mentioned that Fibers are not threads, they just are more, they're sort of logical threads, but not physical threads in the same process. PHP isn't multi threaded, how would this work internally? What would have Fiber do or store, so that the scheduler can resume them for example? What is the internal mechanism, how does this interact with PHP itself. Aaron Piotrowski 9:11 Each Fiber is allocated a C stack and a VM stack on the heap. So switching between them is similar to generators, when switching between Fibers the current VMs stack is swapped and the C stack is swapped, but it doesn't touch any of the other memory in the process, so things like globals are still accessible to each Fiber, since only one Fiber can be executing at the same time, you don't have some of the same race conditions that you have with threads of memory being accessed or written to by two threads at the same time. It can't happen with Fibers that you can have two Fibers that might be dependent on the same memory, and you may have to do some of the same sort of synchronization, that you have to do with threads to that memory if you don't want interleaving of Fibers to be potentially overriding that memory. That's the sort of thing that's being left again to like the async frameworks that would use this to provide that sort of mechanism over a low level Fiber API. Derick Rethans 10:15 Of course when a Fiber is running, there's no need for locking anything because nothing runs at the same time anyway. And of course, when a Fiber suspense itself it then sort of knows that, well, I'm unlocking what I'm wanting to use of don't have this synchronization issue there. Aaron Piotrowski 10:32 You don't have the synchronization issue where you have to worry that while while this Fiber is running, another Fiber might overwrite the same memory. But there is a potential that if a Fiber suspends that while it's suspended another Fiber could have overwritten some global memory, so if you're if you're sharing memory between Fibers it's best to use some sort of abstraction, like channels in Go to share data between Fibers rather than like a global. It could just be a global, it could even be like a class property or something, anything that you might share between two Fibers you could give the same object to two different Fibers, and those Fibers could modify that object. Well, I wouldn't recommend doing that, I would share that object over like a channel instead. Derick Rethans 11:23 Your RFC doesn't talk about channels. So, I reckon that'd be something else that has to be implemented, probably with Fibers in the async framework. Aaron Piotrowski 11:31 Exactly, yes. Derick Rethans 11:32 What is your reason to want to others to PHP core instead of having it sitting in a PECL extension because I could argue that this isn't something that many PHP developers would ever use. Aaron Piotrowski 11:43 I definitely see that point. I think that availability for being able to use that in any sort of application would be important for some reason there still seems to be a hesitation on certain platforms to install extensions. But more beyond that, there are reasons that you'd want to have it in core all the time, extensions that would want to profile code will need to be aware of Fibers. And if, if Fibers are an extension well then actually making use of it in a real application might be difficult because your code profilers don't work very well because they don't understand the Fiber switching. So that is one area that if this were merged into core, code profilers would probably have to be updated to account for that. There was also a bit of an issue in the extension right now that due to destructor order, how the shutdown logic goes. And what hooks are available in PHP, that if a registered shutdown function or a destructor suspends a Fiber, it might have to restart the scheduler unnecessarily. But if it were in core, I could avoid that. And then there's there's also issues with how to handle some of the global stacks that PHP provides when switching Fibers should those be reset, should they remain, but those are issues that can only be addressed if Fibers were part of the core rather than extension. Otherwise I have no choice but to just leave them as stacks that aren't switched. Derick Rethans 13:22 Okay yeah that makes sense, because the stack switching is something that is trickier to do from an extension. Aaron Piotrowski 13:28 Like the error handler, you know, how should that be handled. Should it be the error handler stack depends on which Fiber or should it remain just a constant global and I can't change that from an extension that would have to be part of core. Derick Rethans 13:41 Because Fibers allow you to basically switch between threads. Have you had a look at how how debuggers, for example deal with this? Aaron Piotrowski 13:50 In my testing with Xdebug, I didn't have any issue with inspecting execution stacks, or code coverage, that I will have to really defer to you. If you think that there's any anything that in Xdebug that would have to be updated or changed to accommodate. So far it's worked very well. Derick Rethans 14:10 I know you submitted a bug report with a crash, but that's been fixed already, of course. What was that issue actually, I don't quite remember what it was? Aaron Piotrowski 14:18 Something code coverage where I honestly don't really remember any more. It is invalid pointer for something. Derick Rethans 14:26 It's an interesting thing that's with all these fancy extensions, and Fiber and not being the only, sometimes you run into things that extensions do something very strange that, then make things crash in Xdebug. I can't always test for that of course up front. I actually have a slightly related question that pops into my head here is like, there's also something called Swoole PHP, which does something similar, but from what I understand actually allows things to run in threads. How would you compare these two frameworks, or approaches is probably the better word? Aaron Piotrowski 15:00 Swoole is, they try and be the Swiss Army knife, in a lot of ways where they provide tools to do just about everything and they provide a lot of opinionated API's for things that, in this case I'm trying to provide just the lowest level, just the only the very necessary tools that would be required in core to implement Fibers, I do believe Swoole implements Fibers as well. They use the term co-routine for their Fibers. I believe they actually use the same boost assembly language code that I used for swapping C stacks. I'm not sure if they provide actual threading as well. If they do, then that's great. Of course threading still requires a ZTS build of PHP. Fibers do not because it's still within one process. Derick Rethans 15:55 I know that Swoole definitely doesn't work with Xdebug because the way how they do things, but it sounds like Fibers will actually work just fine. Aaron Piotrowski 16:02 It seems so yes. I've used it already extensively with PhpStorm like setting breakpoints and things to debug. When I was upgrading some of the, the AMP libraries to figure out what was going wrong and it worked perfectly. Derick Rethans 16:16 Are you involved with AMP. Aaron Piotrowski 16:18 Yes, I am. One of the primary maintainers now along with Nicolas. I didn't start the library. The original author has moved on to other things, but it's it's pretty much just Nicolas and I doing most of it now. Bob still contributes occasionally as well. Derick Rethans 16:38 And I guess that's why are you interested in having Fibers in PHP come from then? Aaron Piotrowski 16:42 Yes, exactly. Derick Rethans 16:44 What has the feedback been so far? Aaron Piotrowski 16:47 Largely positive from the people that are more familiar with it. I haven't actually gotten a whole lot of feedback from the core contributors of PHP, so I'm not really sure where the proposal stands with them at the moment, but I guess maybe no feedback is good feedback if they had a problem with it somebody who's spoken up by now, I'm not sure. Derick Rethans 17:09 That is often the case right, if it's if there is something to be added that is quite complicated, you get a lot less feedback. Then where there's something very simple like picking a name for function right. Aaron Piotrowski 17:19 Yes, exactly. Derick Rethans 17:21 When do you think your will be putting this up for a vote? Aaron Piotrowski 17:24 I think I want to wait at least another month or so. I did make a recent change to how the Fiber scheduler API worked, and so I wanted to make sure that that people had time to review it. Maybe send another reminder email or two to internals, so that they, so that more people get a chance to look at it and play with it and provide feedback. Derick Rethans 17:47 Somewhere around mid February? Aaron Piotrowski 17:49 Something like that, yeah. Derick Rethans 17:51 Did we miss anything discussing Fibers. Do you have anything to add yourself? Aaron Piotrowski 17:55 No, I don't really think so. I think we covered the main points of it. Derick Rethans 17:59 I have to say I understand that quite a lot better now, which is always good, and hopefully the people listening to this episode will also find it interesting and understand it well. So I would say thanks for explaining Fibers to me today. Aaron Piotrowski 18:13 Yeah, thanks a lot for having me on. Derick Rethans 18:18 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Fibers Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0