This is the 'PHP Internals News' podcast, where we discuss the latest PHP news, implementations, and issues with PHP internals developers and other guests.

PHP Internals News: Episode 43: Syntax Tweaks

March 05, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 43: Syntax Tweaks London, UK Thursday, March 5th 2020, 09:06 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the RFCs. One on abstract methods in traits, and one about an improvement to the tokenizer. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 43. Today I'm talking with Nikita Popov yet again about a few RFCs that he's produced for PHP 8. Good morning, Nikita. How are you doing? Nikita 0:34 Good morning, Derick. I'm doing great. Derick Rethans 0:37 I've given up on introducing you because we've done this so many times. Now, you don't need an introduction any more. The first RFC I wanted to talk about a little bit this morning is the abstract trait methods validation RFC. What are traits? Nikita 0:51 We usually talk about traits as compiler assisted copy and paste. Basically, we just take all the methods and properties from a trait and copy them into the class that's using the trait. That's a bit over simplified, in particular, you can use multiple traits in the single class. And those traits might be defining the same method, in which case you have to resolve the conflict in some way. So that's where you have these insteadof or use annotations to specify precedents and aliases. Derick Rethans 1:23 Traits has been in PHP for quite a long time. What is now the problem that you're trying to solve through this RFC? Nikita 1:29 The problem is that traits are sometimes not self contained. So to give a specific example, we have in the logger PSR, we have a trait called logger trait, which has a bunch of methods like warning, error, info, notice, and so on. So just simple helper methods, which all called the log method with a specific log level and this trait only specified these helper methods but still requires the actual class to implement the log method. The way you'll usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait. And you have a number of abstract methods that the trait itself requires to work. This already works fine, but the problem is just that these methods are not actually validated, or they are only inconsistently validated. Even though the trait specifies this abstract methods, you could implement it in the class with a completely different signature. Derick Rethans 2:30 Okay, just like any signature? Nikita 2:32 Just like any signature right. The method still has to be present in some way. But the signature can be completely different. Could also be like different method type, like a static method, or an instance method. Derick Rethans 2:43 Just basically checks for the name is what you're saying? Nikita 2:46 Yeah, it only checks with the name. Derick Rethans 2:49 Is this the only place, is this the only time where these abstract methods are not being validated. Or are there other situations where that could happen as well? Nikita 2:57 No, I think this is the only place. Derick Rethans 3:00 Are all the situations where these abstract methods in the trait will get validated. And also on signature? Nikita 3:07 As I mentioned, it's not like the signatures are completely unvalidated. They are just inconsistently validated. It depends a lot on exactly how you use the trait. If you just use the trait and specify the methods of the same class, it doesn't get validated right now. If instead of the method is provided by the parent class, so it's inherited, then it does get validated. If you don't implement the method that makes the class abstract instead, then it's also going to get validated in the child class. It kind of already works halfway. And this RFC just tries to make it work always. Derick Rethans 3:44 Okay, that seems like a reasonably good addition to almost a no brainer. Nikita 3:48 I would say it's basically, a bug. Especially if you look at the implementation, there is clearly some validation code there. The conditions are just a little bit off, but so we do have to go through the proposal, because this is a backwards compatibility break. Derick Rethans 4:02 Yes, I was about to ask if it's a bug fix, why bother with an RFC? But if it's a BC break then yeah, we still need to do it of course. I doubt there be many controversies about is? Nikita 4:12 Actually there is one contentious point. Um, so something I didn't mention yet is that the RFC also allows you to define private abstract methods in traits. Normally private abstract is like a contradiction in terms because private means only visible in the same class. And abstract means it has to be implemented in the child class, you can't really have both. You can't have both with traits, because traits can see the private members in the class. I think that by itself is like not controversial. That's a reasonable thing to have a trait. The part that is controversial is what you do with existing visibility modifiers. This pattern already exists. So people already define abstract methods in traits but because right now private abstract is forbidden, the lowest they can use is actually protected abstract, even though they don't actually want that method to be publicly exposed, or even protectively exposed. So there is an argument there that we should maybe ignore the normal visibility validation that we do, and allow even implementing a protected abstract method from a trait with a private method inside the class, simply for backwards compatibility reasons. Derick Rethans 5:21 Because if you wouldn't allow that then, how would it break things? Nikita 5:26 It would break things because there is existing code, using these abstract protected methods simply because we don't support abstract private yet. So those code would start throwing visibility error, and I mean, could be fixed by just dropping the abstract method, but there's also not ideal. Derick Rethans 5:45 Because people use it to make sure that, I mean it's there in the class that implements the trait pretty much. Do you have any idea when this is going to for vote? Nikita 5:53 I think it can already go up for vote? Mainly I need to resolve that question about the visibility first. Derick Rethans 5:59 I'm looking forward to seeing that showing up sometime soon then. How do you call your second RFC? Nikita 6:05 Object based token get alternative? Derick Rethans 6:07 I think that's a great title. There's a few words in there that we might have to explain first. What are these tokens you're talking about? Nikita 6:14 So the token_get_all function, which we already have, exposes a part of the PHP compiler infrastructure. PHP compilation generally has three steps. The first is the tokenization. The second part is the parser, and then the compiler. So the tokenizer converts the raw character stream into tokens, which encode higher level concepts, for example, that like the sequence of FUNC and so on is actually a function keyword, or that double code followed by characters is actually a string. So this part only recognises like not larger structures, like whole functions but at least the the atoms that make up language. Derick Rethans 7:00 Would you say these are the words that make up the sentences? Nikita 7:03 Yeah, that's that's the right analogy. Derick Rethans 7:06 Why would you want to have access to them? Nikita 7:08 For example, I have a PHP parser library, which converts these tokens into an actual syntax tree. And then on top of that, you can easily analyse PHP source code. So this is what all these static analyzers, like PHPStan or Psalm are based on. Derick Rethans 7:27 Do they all use the tokens? Nikita 7:29 Those two, in particular, use my PHP parser library, and that one uses the tokens internally. There is also other tooling that's more directly based on tokens, for example, code formatters or code style inspection tools like PHPCS. Those all directly operate on the tokens instead. Derick Rethans 7:47 But as you say, these tokens only are words and they don't really provide a structure. How would these tools then convert that into a structure? Nikita 7:54 If you're looking for, if you're looking just at formatting, then you may not really need a lot of structure. So you probably do need to write like that of extra code to recognise that, okay, the function token followed by white space, followed by an identifier, that's function declaration. For the more complicated tooling that builds a syntax tree, you need to implement a parser, either based in code generation, or based on recursive descent approach. Derick Rethans 8:26 Why would you not want to have direct access to PHP's AST instead because that already provides a structure for you? Nikita 8:33 We do have direct access to the AST through the AST PECL extension, which is not part of core yet. I don't know if there are plans in that direction. Derick Rethans 8:43 Well you wrote it so you surely can make these plans. Nikita 8:46 Yes, I can make them but I don't know if I should make them. Derick Rethans 8:50 I think you should. Nikita 8:51 I mean, the nominal advantage of the AST extension is that it's always up to date with PHP. In practice that really isn't an issue, because some of the userland tooling is also pretty quickly updated. The more practical advantage is that the extension is a lot faster than implementing this in userland code. Well, I mean, this is really one of the areas where C code is faster than PHP code. The AST extension only exposes the structure that PHP itself needs. PHP is not interested in like precise formatting, and things like that at all. So it throws away quite a few things. You can, for example, get accurate on position information. Like, where, exactly not just which line but of which column, something is defined. And that's something you're quite often interested in. Derick Rethans 9:46 Also, from what I've known, it throws away all the comments unless they are doc bloc comments. How does the tokenizer currently return information about the tokens? I've played with this in the past and I didn't think it was the prettiest format to get back out of it. Nikita 10:02 token_get_all returns an array of tokens. And there are generally two types of tokens. One is single character tokens, like a semicolon, or a comma, or whatever, which are just returned as a string. So it's a single character string. And then there are complex tokens, like the function keyword, like white space, like strings, which are returned as an array where the first element is the token ID, which is an integer. And we have constants defined for these integers. The second element is the actual string content of the token. So for the function keyword, that's always going to be function, but it could be written in different ways because the keyword is case insensitive, so it could be all lowercase, or uppercase, hopefully it's all lowercase. Derick Rethans 10:52 You'll get the odd situation where the first letter is the capital, I suppose, but that's about it, hopefully. Nikita 10:57 And finally, the last element is the line number. So the starting line number. Derick Rethans 11:02 So if you want to look at the position on the line, you'd have to calculate it yourself? Nikita 11:08 Right you would have to track that yourself. I mean, there are two problems. One is just that you have these single character tokens and the complex tokens using different structure. So all the codes using them as to always switch back between those; check if it's an array or a string, or a test to do some kind of normalisation itself. And the second problem is that arrays in PHP are fairly memory inefficient when it comes to storing a fixed amount of data. Storing three elements inside an array always means allocating an array for eight elements. Because its minimum array size, you have to use space to store the key, and so on. Generally, if you have a fixed structure, it's much much more efficient to store it inside an object. Using a class that has declared properties. So this makes a very large difference in some cases, especially if your array only has like two or three elements, you can save a lot of memory with it. Derick Rethans 12:12 Have you done any benchmarks to see how much memory you'd actually save some likes some some particular scripts that you've parsed with how to tokenizer doesn't matter and how you proposing to do it? Nikita 12:22 Yeah, I have here in the RFC, some numbers for some particular script that goes down from 14 megabytes to eight megabytes. So that's nearly half the memory usage. Well, actually, maybe I should first actually say what the RFC proposes. The RFCe proposes to instead return objects, an array of objects. And these objects have four properties. So first is again, the ID of the token, then the textual content, the line number, and also the starting position of the token in the string. Derick Rethans 12:54 Is this something that the tokenizer extension and tracks for you? Nikita 12:58 I mean, that's something that can easily do, so we can just as will expose it. And these objects are always used. So we no longer make the distinction between single character tokens and complex tokens. So we always return the uniform array of tokens, of token objects. Despite doing that, removing this optimization for a single character tokens, the end result is still that we use half as much memory, simply because objects are that much more efficient than arrays. Derick Rethans 13:27 That's a clever trick. I'm sure people like that, that using less memory, at least I know I would. Is it also faster or doesn't particularly matter much? Nikita 13:35 It's also faster, like maybe 30% or something, because memory usage and performance tend to be pretty heavily correlated. So if you use less memory, you also are faster. Derick Rethans 13:46 That makes sense. Are you thinking of other things that you can add to the tokenizer extension to make working with them even easier? Nikita 13:52 The way this new functionality is implemented is, we have a PHP token class and on it we have a static method getAll. So instead of calling the token_get_all function, you call PHPToken::getAll(). And one nice thing this allows you to do is to extend this token class. So you can say, MyPHPToken extends PHPToken, and then you call MyPHPToken::getAll() and then we will actually construct your extension class. That means that you can add whatever methods you like, in addition to what we provide by default. Derick Rethans 14:29 Is that a pattern that we have in other places in PHP as well? Because I don't usually think that even if you'd call an inherited static method, why wouldn't suddenly return the inherited classes object? wDo we did it in other places? Nikita 14:42 So this is somewhat uncommon in PHP internals. I think it's a pretty common pattern for userland where generally if you return new objects from static methods, you always use new static, not new self. This is essentially late static binding, which we did discuss quite recently. So, there is one limitation here namely that the constructor of the PHPToken class is final. So, you can extend the class and you can add extra methods, but you cannot modify the construction behaviour, because we would like to internally construct these tokens very efficiently by more or less directly writing the values into the right slot in memory and not doing slow constructor calls, becouse this functionality tends to be very performance sensitive. And the same trick where you can extend the class but not change the constructor is also used by the SimpleXML extension. Does exist but not very common in, generally where internal code is concerned, we usually do not really plan for extension. I think nowadays we mark nearly all internal all new internal classes as final simply because extension is such a pain to deal with. And for old classes who usually wish that we had marked them as final. I mean, this is also a general recommendation for userland that, like you should mark things final as much as you can get away with it. But it's much bigger concern for internals because dealing with userland extensions that do unexpected things is much harder for us. Derick Rethans 16:23 You even need to make sure that your internal structures are properly constructed by the parent's constructor being called from inherited classes but in PHP, there's no such requirement that you do. Pretty sure I've had problems with that for the Date extension a long, long time ago, where people would extend from it, not call the constructor. And then because he didn't think of it, nothing is defined and everything just falls down. Nikita 16:44 Yeah, so this is one of the common problems. And the other one is that internal classes often define custom object handlers. So that's something only internal classes can do. Just to give one example, they can define debug info handler that modifies the output of var_dump, but nowadays we also have the user land magic methods on get you back into and I think pretty much all internal classes are just going to ignore that, and always return their own internal debug information even if this method has been overwritten, simply because no internal class actually checks for that. And this kind of problem also exists for a lot of other magic, and generally no one takes it into account, and things are just more or less softly broken. Derick Rethans 17:31 Very recently there was a pull request for Xdebug to change that as well because in Xdebug's debugging output get sent to IDEs. For internal classes always uses internal get debug handler, and for userland classes it uses whatever is userland defined; I mean if there's a magic method we'll use that. The pull request wanted to change Xdebug in such a way that it would also use the get debug info magic method for internal classes, whenever overridden. After some discussion about this, we figured out, this is probably a bad idea to do, and hence, we haven't merged that. Although we end up fixing some other things that the developer also found. That's a curious situation to be in. We would like us to be sort of work the same. But at the same time, you sometimes really want to see the internal information from the classes without developers having hidden the information behind it, right. Nikita 18:20 Yeah, that's true. Derick Rethans 18:21 And that is just from a from a debug perspective. And even from, let's make sure things don't crash perspective. I see that the RFC also rejected a few features that aren't part of the current iteration yet or might make sense to add it later. And one of them is about a lazy token stream. What would that be and what sort of different interface would it have? Nikita 18:43 The lazy token stream basically just means that instead of returning an array of tokens, we return an iterator of tokens, which means that we do not have to store the full array in memory, which, like for the example, I used. The memory usage for the whole token array was eight megabytes, even after these memory usage improvements, which wasn't a fairly large file, but definitely not the largest file. You can encounter especially when it comes to generated files. So there is an advantage of processing tokens one by one as a stream, because then your memory usage is going to be basically O(1), not O(n). The problem is, I mean, the PHP lexer does indeed work one token at a time, so it can support it. The problem is that it has a lot of internal state. And in order to implement this iterator, we would have to backup and restore the state on each produced token to make sure that it's still possible to for example, include and compile other files in the meantime. So this is something that can be improved; we can make that cheaper, but that would be a larger effort. And I'm not really sure it's worthwhile because, while you can process one token at a time. And this is, for example, what the PHP parser does internally. Many practical applications in userland will generally want to have all tokens as an array. Because it makes it simply, makes things easier if you can always look ahead and look back. And I think it would be fairly hard to rewrite the existing libraries in terms of the latest tree. It may be a nice to have, but I'm not the most useful thing for it now. Derick Rethans 20:32 What has been the feedback for this RFC? Nikita 20:35 I think pretty good. This is something that we've already discussed years ago. Last time the discussion kind of got a bit got a bit sidetracked. Yeah, one of the dangerous when you start introducing object oriented interfaces. Well, actually, I just call this RFC object-based intentionally, because when you do object oriented then people would like to have their tokens, and their token streams, and their token stream factories, and the token stream managers. And this is basically held this the whole time. But generally everyone who is working on tokens, which is not a lot of people, but those who are working with them know that memory usage is a problem. And the current, current inconsistent structure is a problem, which is why most of them implement their own token objects, and basically do the same thing we propose here just themselves. Derick Rethans 21:30 When it's this one going up for a vote at the same time? Nikita 21:32 Soon. Derick Rethans 21:33 Both of these RFCs that we spoken about today are both targeted to a PHP eight, I suppose? Nikita 21:37 Yeah. So right now, I think all RFCs are targeted at PHP 8. Derick Rethans 21:42 Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and see what happens to them. Nikita 21:52 Thanks for having me once again. Derick Rethans 21:56 Thanks for listening to this instalment of PHP internals news. The weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Validation for abstract trait methods RFC: Object-based token_get_all() alternative PSR-3 Logger Interface PHPStan — PHP Static Analysis Tool Psalm PHP-AST PHPCS Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 42: Userspace Operator Overloading

February 27, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 42: Userspace Operator Overloading London, UK Thursday, February 27th 2020, 09:05 GMT In this episode of "PHP Internals News" I chat with Jan Böhmer (GitHub, LinkedIn) about the Userspace Operator Overloading RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 42. Today I'm talking with Jan Böhmer about Userspace Operator Overloading. Jan, would you please introduce yourself? Jan Böhmer 0:33 Hi, my name is Jan Böhmer. I'm a physics student from Germany. And I'm the author of the Operator Overloading RFC. Derick Rethans 0:40 What brought you to writing this RFC? Jan Böhmer 0:42 Mostly because I have worked with monetary objects in the past. And it was a bit tedious to work when it comes to calculating. And whenever you have to want to calculate something, you have to call functions on objects. This is not possible to call, just use operators like with normal values like floats or integers. Derick Rethans 1:06 Because the monetary objects themselves had multiple things embedded in there or something like that? Jan Böhmer 1:11 Yes, they describe mostly a value and a currency. And together they are saved in an object. Derick Rethans 1:18 Okay that that seems like a reasonable thing to do, right? I mean, other times people say the same thing about doing complex numbers or something like vectors. The RFC is called Userspace Operator Overloading. What is operator overloading? Jan Böhmer 1:31 Yeah. Basically, is the idea that you can define operators, like addition or subtraction, or the string concatenation for objects Derick Rethans 1:43 Does PHP already have something like this? Jan Böhmer 1:45 Actually, yes. Objects can have something that calls do operation handler. This is called whenever PHP encounters an object, but if used with an operator. The problem is that this handler is only available for PHP internally. So if you want to use it, you have to write an extension. Derick Rethans 2:06 So it will be possible to have in an extension a Monetary class with its own operators already defined on it. Jan Böhmer 2:14 Exactly PHP extension GMP uses this as already. The problem is that it's not very flexible, you already have to know, be familiar with C, you have to be able to compile that. You have to contribute it to whatever system you want to use it. Since we have the foreign function interface since PHP 7.4 we can implement many things without to actually have an extension. But this operator overloading is something that's not possible yet inside from PHP. Derick Rethans 2:47 So it wouldn't have been possible to write the GMP extension which is of course a wrap around libgmp with FFI, because there's no operator overloading available in PHP. Jan Böhmer 2:59 Not in that comfortable way. You could use this way with functions but it would a bit more tedious then using just operators. Derick Rethans 3:09 You've mentioned the Monetary object as a good use case. What other use cases can you think of? Jan Böhmer 3:15 Higher mathematical objects like complex numbers, vectors, or something like tensors, maybe something like the string component of Symfony. That's you can simply concat this string objects with a normal string using the concat operator and doesn't have to use the function to call that, because basically, this should behave similar to a basic string variable, and not, like something completely different. Derick Rethans 3:45 What is the syntax you're proposing for implementing this? Jan Böhmer 3:49 My idea was similar to Python to use special metric function, the methods for every operator you can overload. So if you want to overload the addition operator, you would implement function called, a static function called __add, for example. This offer this function takes both operands, the left hand operand and the right hand operand. So you can decide if your current, this object is on the right or the left hand. That is important to determine something like one divided for zero, or one divided through two, or two divided through zero. There are two complete different cases and you have to be able to differentiate between the two cases. Derick Rethans 4:39 And wouldn't that not be possible to do in non static functions? Jan Böhmer 4:43 Another problem with non static functions such as possible access to this variable. If you modify an object from inside an operator handler, this can lead to very, very strange behaviour. Because normally operations doesn't change the object itself, but rather you should return a new value. The problems such as asked us to this, it is very easy to accidentally change the this object. If you only pass both objects like via a static methods, it is a bit more clear that you have to create an all new object Derick Rethans 5:24 Would a type hint enforce that you return a object to the same class? Jan Böhmer 5:29 Not an all case you want to return an object of the same class. For example, takes a dots product of vectors. So you take two vectors, multiply it in some way as you return to normal float value. Derick Rethans 5:43 Of course, yes. Jan Böhmer 5:44 If you were to enforce that, but would always to be the same types as those limits the use cases, in my opinion too much. Derick Rethans 5:52 But you could of course type hint the __add operator yourself? Jan Böhmer 5:57 It's always typehints in arguments, in my observation are used as a hint which type are supported for the operand handler. If you for example, vector plus an integer, and your operator handler only declares vector vector as a parameter types, then this operator will not be called, and it will tried to be called on the second object. Derick Rethans 6:24 So it won't the called and instead it falls back to the second object to be called on. Jan Böhmer 6:29 Yes, the idea behind it, is that only one of the objects have to know about both classes. So if you want to combine, for example, two objects from different libraries, and library A doesn't know about library B then only objects of the second library have to know about object A. In C++ you can define supported type outside of classes. So you can define combinations between arbitrary objects. The problem is in PHP this was a bit complicated. And the best way to implement this handler in types or classes. So the class has to know about each other objects, it could be interact with possibly. Derick Rethans 7:14 That makes sense to me. What happens if neither of the classes, or if one of them is a class, and the other one is just a scalar type, if none of the add methods fit, what would happen then? Jan Böhmer 7:24 The operators implement an handler, then those doesn't support them, then an error would be thrown. Derick Rethans 7:32 And that is a type error like you'd normally get? Jan Böhmer 7:34 If the object doesn't implement operator at all, then a notice would be triggered. The idea's that in the moment, it is possible to write something like object plus one, this would be a fine expression in PHP, in the current PHP versions, the object could be interpreted as a one and just a notice would be thrown. For compatibility reasons, my RFC does the same behaviour if no operators are overloaded on objects. Derick Rethans 8:05 That seems like a reasonable compromise there. I remember from in the past, I think it was Sara Golemon that wrote an extension for using operator overloading. And I remember from the time that there is a problem with using the lesser than or greater than operators, because I think one of them gets flipped around automatically in the engine is being changed in PHP already, or are you running into the same problem? Jan Böhmer 8:28 I'm not sure about this. My RFC doesn't mention comparison operators like greater or less at all. Because comparison, handled differently internally of PHP. This doesn't work about this. This is mentioned do operator handler. It would be a bit other implementation to do this. Also, the comparison is a bit complicated on its own terms. Maybe it's more useful to use interfaces for, to implements this overloading, or to use. Also, there are some problems. Maybe we should only allow something like an compare operator that's resolved either, minus one, one, or zero. If object's lesser or equal, so that everything is defined together at once. So it's not possible to define an object that has maybe, for example, the lesser, but not the greater operator. Derick Rethans 9:32 But this sounds like that's for a different RFC. Jan Böhmer 9:35 Exactly. That's a bit complicated. If the current operator overloading RFC gets passed, then maybe a comparison operator overloading RFC would make sense. Derick Rethans 9:46 From reading the RFC, I've noticed that you also won't be able to use a shorthand assignment operator. So for example, plus equals. What is the reason for that? Jan Böhmer 9:56 So every shorthand operator becomes currently an assignment of A plus B. The do operation handler cannot decide if an shorthand operator or normal operator was called. Allowing to overloads the shorthand operators, would maybe allow some benefits for objects terms of memory optimization. If you call a short hand operator you can mutate the object itself doesn't have to create a new object which takes more memory, but I think with the garbage collector of PHP that is not such a big problem. And if that is really needed feature in the future, this could be edited in other, later version of PHP. Derick Rethans 10:41 Okay. Jan Böhmer 10:42 Many other languages doesn't allow to otherwise shorthand operators so I don't think that as too much need for. Derick Rethans 10:49 Operator overloading sometime has criticisms directed at it. What are some of the criticisms you've heard about it? Jan Böhmer 10:56 First of all, there are some criticisms about the operator overloading idea in general. So there's also some criticism could be abused for doing some very weird things with operator overloading. So as mentioned C++ there is a shift, left shift operator, is used for output in a stream to the console. Or you could do whatever you want inside this handler, so if somebody would want to save files or modify the file in inside operator overloaded handler, it would be possible, and it's in the most cases function would be more clear what it does. Derick Rethans 11:35 Of course, in a function add(), if you implemented yourself, nothing stops you of course on writing to a file either. Jan Böhmer 11:41 Operator overload issues, in my opinion only be used for things that's related to maths or creating custom types that behave similar to the built-in types. Derick Rethans 11:52 Like complex numbers, or vectors, or monetary numbers. So far, we have been discussing this RFC for a few weeks now. What do you think the chances are of it being passing? Jan Böhmer 12:05 I'm not sure. I think the idea of operator overloading in general is accepted in the community, but doesn't hear so much backlash. There was some time discussion about how to do it. Some people think it's maybe better if you would implement operator overloading with interfaces, like with ArrayAccess, or to introduce some completely new keywords, like in other languages. In C++, or C#, there are a special keyword operator, that's marks an operator overloading function. So it is clear that is not a real function but special handled way. Derick Rethans 12:49 Instead of using the underscore underscore in front of method names. When do you think you'll be ready to put this up or vote? Jan Böhmer 12:56 Wasn't it busy last days, I will do some revises to my RFC, and polish my implementation. Derick Rethans 13:06 Okay, thank you very much this morning for taking the time to talk to me Jan. Jan Böhmer 13:10 Thank you very much for inviting me. Derick Rethans 13:13 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Userspace Operator Overloading Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 41: __toArray()

February 20, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 41: __toArray() London, UK Thursday, February 20th 2020, 09:04 GMT In this episode of "PHP Internals News" I chat with Steven Wade (Twitter, GitHub, Website) about the __toArray() RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hi, this is Episode 41. Today I'm talking with Stephen Wade about an RFC that he's produced, called __toArray(). Hi, Steven, would you please introduce yourself? Steven Wade 0:35 Hi, my name is Steven Wade. I'm a software engineer for a company called follow up boss. I've been using PHP since 2007. And I love the language. So I wanted to be able to give back to it with this RFC. Derick Rethans 0:48 What brought you to the point of introducing this RFC? Steven Wade 0:50 This is a feature that I've I've kind of wish would have been in the language for years, and talking with a few people who encouraged it's kind of like the rule of starting a user group right? If there's not one and you have the desire, then you're the person to do it. A few people encouraged and say: Well, why don't you go out and write it. So I've spent the last two years kind of trying to work up the courage or research it enough or make sure I write the RFC the proper way, and then also actually have the time to commit to writing it and following up with any of the discussions as well. Derick Rethans 1:18 Okay, so we've mentioned the word RFC a few times. But we haven't actually spoken about what it is about. What are you wanting to introduce into PHP? Steven Wade 1:25 I want to introduce a new magic method. The as he said, the name of the RFC is the __toArray(). And so the idea is that you can cast an object, if your class implements this method, just like it would toString(). If you cast it manually to array then that method will be called if it's implemented. Or as, as I said, in the RFC, array functions will it can it can automatically cast that if you're not using strict types. Derick Rethans 1:49 Oh, so only if it's not strictly typed. So if its weakly typed would call the toArray() method if the function's argument or type hint array. Steven Wade 1:58 Yes, and that is actually something that came up during the discussion period, which is something again, this is why we have discussions, right? Is to kind of solicit feedback on things we don't think about it, we may overlook or, and so someone did point out that it is, you know, it would not function that way, or you would not expect it to be automatically cast for you, if you're using strict types. Derick Rethans 2:17 Okay. Steven Wade 2:18 The RFC has been updated to reflect that as well. Derick Rethans 2:20 So now the RFC says it won't be automatically called just for type hint. Steven Wade 2:24 Correct. Derick Rethans 2:24 Not everybody is particularly fond of magic methods. What would you say about the criticism that introducing even more of them would be sort of counterproductive, because you'll end up not necessarily knowing what happens if you start calling a method, when you do a cost, for example. Steven Wade 2:38 The beauty of PHP is in its simplicity. And so adding more and more interfaces, kind of expands class declarations enforcement's and in my opinion, can lead to a lot of clutter. And so I think PHP is already very magical. And the precedent has been set to add more magic to it with 7.4 with the introduction of serialize and unserialize magic methods, and so for me it's just kind of a, it's a tool. I don't think that it's necessarily a bad thing or a good thing. It's just another option for the developer to use. Derick Rethans 3:06 Two episodes ago, I spoke with Nicolas Grekas about a Stringable interface that he suggested to introduce, which is a little bit similar to sort of the casting with toArray(). And hence, do you think it would have make sense to have an __toArray() also happen if the class implements a interface with a typed function argument? Steven Wade 3:29 I think that would be two separate RFCs. I think the first one to kind of get it on par with what's what we have now in PHP would be to introduce the toArray(). And then a separate one would be if we wanted to follow suit with an arrayable interface. Derick Rethans 3:43 And which is the same thing that happens with the Stringable interface, right? We have had toString() for how many years, decades? But from what I understand, if you have a typed property "string", it would also call the toString() method when it's defined on an object that's being passed in, or do I misunderstand that, there are misremember that? Steven Wade 4:00 I haven't followed that one too closely. I've kind of been catching up on some of the discussion today. But and yeah, I don't know off the top of my head what that would do. Derick Rethans 4:07 I didn't mean with the ori.. with the newly suggested Stringable interface with adults we currently have. Steven Wade 4:12 I'm not sure how that would work. Derick Rethans 4:13 I don't know, either. That's what I'm asking you. Steven Wade 4:15 With the array and with the typed properties? That's a good question. That's again some feedback, we kind of need to that I need to think through Derick Rethans 4:21 Because I think it would make sense to at least behave the same and I don't particularly mind which way it goes. Me that's, that's a personal opinion here. Steven Wade 4:28 And that's a great idea I need to haven't played with 7.4 too much, I need to pull it down and try and just see what the behaviour of string is because that's the main goal of this is to try and just get this on a parity, functionality parity with with what's toString() will do. And so if that is how it handles it with typed properties and I would want to implement that as well. Derick Rethans 4:47 In a similar way. I don't also know what happens if if you have toString() available in a class and you pass it in as an argument that is typed as string. Steven Wade 4:54 Even though at least when my test was weak types, it will actually cast that for you. If you have that. String argument type hint, it will cast it and then that will be a copy. So it will actually just be the result of that cast to string. I do not think I think it throws an error if you have a strict type set. No, I think it'd be very similar, right. It's just how you want to use it in user land, you know, the __toArray() is you're going to you could cast it yourself ,or you can with weak types PHP could cast for you in the appropriate circumstances. If you want the same functionality. In some for now, you would need to call, you know, the __serialize() yourself with the toArray(). In the future, you could implement the toArray() and then your serialize could actually just cast this object to array, and then that should actually convert that for you. And then serialise will then return array so you're not duplicating how you want that object represented when it's an array. Derick Rethans 6:00 So the RFC mentions that when you do a print_r of person is called __toArray(). But that's not particularly a cast. So why would it do it here, but not for method arguments, for example? Steven Wade 6:11 That is a product of this being my first time and that was a mistake that was thankfully pointed out during the discussion period and has been corrected. Derick Rethans 6:19 I read this RFC a week or two ago or so. And I haven't.. I should have reread it this morning that. I did not so my apologies for not being fully up to date here. There's some array functions in PHP like sort() that operate on an array as a reference right? That can't particularly work if you first have to cast to an array, which is what your current RFC now just. I mean, toArray() only gets called when you cast to an array or when it's a weakly typed argument. But how would it work for methods or functions that accept an array by reference? Steven Wade 6:49 At least the way I proposed it, they would throw an error as it currently does. Again for my test and trying to keep this within parity with the toString. I don't believe there are many functions that will operate on toString on, on a string by reference, as there are with arrays. From what I can recall is that it would throw an error. If you try to operate by reference on an object that implements toString, it will throw an error. Derick Rethans 7:10 And it wouldn't just fall back to using an object because that'd be very strange behaviour in that case, I suppose. Steven Wade 7:15 Basically, if it's if it's not something that can be cast or converted to an array through this method, and it's just going to be the same functionality you have in current PHP, which will be throw an error. Derick Rethans 7:24 Going to go for the principle of least astonishment or something. Steven Wade 7:27 Yeah, I don't want to introduce too many changes to it. I just want to be able to cast. Derick Rethans 7:31 I think that is a great idea. Actually, I mean, the same thing I've spoken with Nikita about, that introducing features step by step makes it a lot easier for people to comprehend what you actually end up doing. And there's also less, less chance of people getting bogged down in liking a specific aspect of the RFC but not of the other RFC parts. And we end up not merging the whole thing with the sub part of it. Steven Wade 7:54 And that's why I was very purposeful and not including any kind of write. You write, you cannot write to a class that implements toArray(). You know, as you will with array ArrayObject, because that we have that for a reason. So this is different functionality, we just wanted to keep it small, and just have this little helper Derick Rethans 8:11 I read in the RFC, something called get_mangled_object_vars(), but I didn't quite understand what it was. Steven Wade 8:16 So that was actually a function introduced in 7.4, as a direct result of my original proposal trying to see what people thought in the internals and in the community of this feature. Sometime in spring, last year 2019, I began this discussion, and there was some initial feedback with folks saying that it would cause some breaking changes in their libraries or their code, because they are overloading the casting. Right now, if you cast an object, I guess you get insight into the object's internals without any side effects. And so I think that's how Symfony's var dumper works. And that's how they're able to display some of that information. So that was concern by introducing this, that functionality would break. And so to introduce a method that would give you the same benefits without overloading the casting, the get_mangled_object_vars() was introduced and accepted and implement in 7.4. Derick Rethans 9:04 And that returns the object properties with their special characters in place. Because PHP internally, if you have a private method, the name for both methods and property is done by doing a null character, the name of the class, a null character then the property name. So that's what that would return, I suppose. Steven Wade 9:22 I believe so. Derick Rethans 9:22 I ran into a similar issue in Xdebug, because in some cases, you want to call get_debug_info, which is what people implement for getting debug info for their objects. But in other cases, you don't want to do it because you want to see everything that happens internally, or you want to see all the properties that exist. So there's kind of a tricky one. And I think at some point with toArray also happening, I might actually end up adding the output of both toArray() and get_debug_info separate sort of fake properties into the Xdebug output. But of course that only works if toArray() has no side effects. I don't think there's any way of preventing that in the toArray method that you can now implement that it doesn't change any information in normal properties, for example, right? Steven Wade 10:12 And that's kind of some of the internals of it that I'm not fully familiar with. With it, I'm hoping to kind of, you know, the discussion period will help eliminate some of that. Derick Rethans 10:20 I don't think you'd be able to actually. Steven Wade 10:22 Just recently, we were able to throw an exception from the toString. I don't know if you can actually do any kind of operations, write operations on the object within the toString? I do? That's a good question. And I do look that up. And whatever that behaviour is, we'd want to mimic here as well. Derick Rethans 10:34 I believe you can. It's normal PHP code, right? And if you don't want to do it, you need to clone it first, which is something you could choose as an implementation, right? You could first clone the class and then call the toArray method on the cloned object. I don't think we have any protection for that. The RFC is currently in the discussion phase. At the time of recording, we're talking about the discussion period. When I sort of thinking of ending that and going for vote? Steven Wade 10:58 I think this is actually going to be probably a longer period of discussion. And I think most RFC is most fleshed out just because of the nature of it. I am a full time employee full time, father, husband, and also student, as well. And so I don't have a lot of time to do this. And I want to do it right. I want to be able to respond to this. And so the discussion opened up a week ago, and this morning is the first time I've had to be able to respond to that and update the RFC. And so I because I really care about this and would love this feature to go in. I want to continue to solicit discussion and advice and questions and to be able to answer them all and do that. So however long it takes. Ideally, I would love it to be closed, voted on, accepted and implemented in time to be able to get in for the feature freeze for 8.0. Derick Rethans 11:40 For that you have about four months. Would you have anything else to add that I forgot? Or you want to add that you think it's interesting to know about this RFC? Steven Wade 11:50 Yeah, the only thing I would add is I've seen discussion, someone posted the RFC on Reddit and I've seen discussions with people like it, people hate it. They want to move one way or the other again, it's just It's a small feature, it's a helper. It's a tool that you can use. Is it perfect? No. Is it going to satisfy everybody? No. You've got the people who are want more functional and procedural you got people who want more OOP. I think it's just another helpful tool that could be in your tool belt. If you use it great. If you don't, you don't have to touch it. Derick Rethans 12:19 Very well. Thank you, Steven, for taking the time to talk to me this afternoon. I'm looking forwards on this coming to vote at some point. Steven Wade 12:27 Thank you for having me on the show. And let me explain the purpose and the reasoning behind this RFC. And thank you very much for giving a voice to those looking to improve the language. Derick Rethans 12:35 You're most welcome. Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: __toArray() Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 40: Syntax Tweaks

February 13, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 40: Syntax Tweaks London, UK Thursday, February 13th 2020, 09:03 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about a bunch of smaller RFCs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 40. Again, I'm talking with Nikita. Perhaps we should rename this podcast to the Derick and Nikita Show at some point in the future. This time we're going to talk about a bunch of smaller RFC that he produced related to tweaking PHP syntax for PHP 8. Nikita, would you please introduce yourself? Nikita Popov 0:42 Hi, I'm Nikita and I do PHP core developement on behalf of JetBrains. We have a couple of new and not very exciting RFCs to discuss. Derick Rethans 0:53 Sometimes non not exciting is also good to talk about. Anyway, the first one that caught my eye was a RFC called static return type. So we have had return types for well, but what is special about static? Nikita Popov 1:07 So PHP has three magic special class names that's self, referring to the current class, parent referring to the well parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved, then static is the same as self introducing refers to the current class. However, if the method is inherited, and you call this method on the child class, then self is still going to refer to the original class, so the parent. While static is going to refer to the class on which the method was actually called. Derick Rethans 1:51 Even though the method wasn't overloaded Nikita Popov 1:54 Exactly. In the way one can think of static as: You can more or less replace static with self. But then you would have to actually copy this method inside every class where. Derick Rethans 2:09 You have not explained the difference between self and static. Why would you want to use static as a return type instead of self? Nikita Popov 2:17 There are a couple of use cases. I think the three ones mentioned in the RFC are. The first one is named constructors. So usually in PHP, we just use the construct method. Well, if we had to give this method, a type, a return type, then the return type will be static. Because of course, the constructor always returns while the class you're actually constructing, not some kinda parent class. And named constructors are just a pattern where you use a static method instead of a constructor, for example, because you have multiple different ways of constructing an object and you want to distinguish them by name. Derick Rethans 2:57 Could we also call those factory methods? Nikita Popov 3:00 Yeah, that's also related pattern. So for named constructors, you usually also want to return the object that it is actually called on. Derick Rethans 3:09 It makes sense attached there because of that then creates a contract that you know that is named constructor is going to return that same class and not something else. Because there's no requirements that would otherwise require that same class, like you'd have to construct. Nikita Popov 3:22 Exactly, yeah. The other pattern. These I think maybe that popularised by PSR, maybe 7 or something, the HTTP request object interface, the object is actually immutable. And the way you change it is by calling it with something method. And this method is going to return you a new object with this particular bit of information replaced. And again, usually for these kinds of API's, you also want to, you want to return the class that the methods actually call them. So if you extend this kind of API, you don't want to get objects of the parent class back. Right the third way, and I think the like by a pretty large margin, the most common one, is just normal fluid mthods. So where each method returns this. This is always an instance of static. So if you extend the class then this is going to be the extending class of the parent class. For this particular case, in PHP there is also a different convention where instead of returning static, you actually need right, return $this. So you use $this as a one word type, a special types indicates this type of method. So static would be a slightly weaker form of that. But we might still add the special $this case in the future. Derick Rethans 4:51 Because static would only enforce it's the same class but not to the same object. Nikita Popov 4:56 Exactly. Derick Rethans 4:56 Are you intending to add that to this RFC? Nikita Popov 4:59 I like to keep RFCs like different issues separated. Derick Rethans 5:03 It makes it easy to talk about them and get them accepted or not. Nikita Popov 5:06 I'm not totally convinced on the $this thing, because static is in the end. I mean, we already allow self return types, we allow parent. It make sense to allow static. But this is not really a type or some kind of extra contact contract on top of the type system. And I'm not sure it makes sense to open this position. Derick Rethans 5:28 Okay, in which position would you be able to use the static keyword? You've already mentioned the return types, there are other places as well? Nikita Popov 5:36 No, you can only use it in return types, it would simply not be sound. So it would violate the liskov substitution principle in any other place. The reason why you can use static in return types, is that static is basically a restriction on each inheriting class. So in your original class, static is the same as self. Then in the inheriting class, static is again the same as self, but in the inheriting class. And the inheriting class is a subclass or a subtype of the parent class. So this is allowed by the liskov substitution principle or by our variance rules. If you do the same things for parameters, you would also go from having a parameter for the parent class to parameter for the child class. So you would restrict the amount of inputs that are allowed in this parameter. And that's invalid. And the same argument also goes for properties. Derick Rethans 6:37 The RFC also talks a little bit about variance and subtyping. How is static considered here differently from self, or if you just explained exactly that? Nikita Popov 6:46 static is considered a sub type of self. If you have a parent method that uses a self return type, you can have a child method that uses a static return type, because static is ta further restriction. So self allows, still allows you to return the parent class, while static does not allow it. So you restrict the amount of return values and that's valid. While going the other direction. So replacing the static type and the parent method ,with the self type in the child method that would not be valid. Because, you make the amount of low values larger. Derick Rethans 7:21 And that is exactly the same as the other variance rules that we have since PHP seven for of course. The the last thing the RFC mentions or actually don't quite remember whether it mentions is, is whether you can also use static as part of a union type. Nikita Popov 7:35 So yes, you can. Derick Rethans 7:37 Okay, that's the simple answer. I like simple answers. Nikita Popov 7:40 Together with the other restrictions. So that union type has to be in the return type position. But apart from that, you can. Derick Rethans 7:47 That's good to hear. Nikita Popov 7:48 There is actually one more tricky thing regarding the property types. Without a lot of static and property types because as I mentioned, it would violate our variance rules. But unfortunately we have the extra issue that we also have static properties. So if you write public static foobar, then is that static for a static property or for a static type? Derick Rethans 8:14 Right, because we don't enforce that a static goes or goes before or behind public, private, or protected. Nikita Popov 8:21 Yeah. Derick Rethans 8:21 At least not in the syntax. I mean, I think coding standards actually do most of the time require the static to be before. Nikita Popov 8:27 Even the coding standards they would require you to write it as public static, not this static public. Derick Rethans 8:35 Oh, really? Okay. I thought was the other way around. Yeah, that is difficult. Because then you don't know which static is meant here. Nikita Popov 8:41 Yes, and we just allow on the, disallow it on the grammar level. It's actually a bit ugly, because we have to like duplicate the whole type grammar two times, once to include static, once to not include it, just to deal with this ugly of conflict. Derick Rethans 8:56 That's what happens when you come with something clever. You need clever workarounds. Nikita Popov 9:00 So it's unfortunate that the static keyword has like three or maybe four completely different meanings in PHP. Simply I think, simply because people wanted to re use a keyword, instead of introducing a new one Derick Rethans 9:15 Because introducing new keywords might end up meaning breaking people's code. Nikita Popov 9:19 On the downside, reusing keywords makes code confusing, because well, at least I got the impression that some people find the use of static for late static binding somewhat confusing. And I can also see if you see methods that has signature public static, whatever and return static, and you're not like super familiar with what all of that means. Derick Rethans 9:46 And that is quite a common pattern because this named constructors are static methods that return static. Let's move on to the next one, which is a tiny RFC that you came up with, which is the Class Name Literal on objects. What does this do? Nikita Popov 10:03 The syntax where you write a class name, then the double colon class. And that just returns you the fully qualified class name. For example, have a use statement for that class, you get back the full name instead of the short name. I think we've had this since PHP 5.5. And it's a great feature because it's like makes it clear where you're referencing the class and not just some random string. And that means, for example, that that IDE refactorings could work better and so on. Derick Rethans 10:35 Okay. Nikita Popov 10:36 The actual RFC is very simple. Currently, the class syntax is only allowed on like literal class names, but you can take an object variable and get the class of that object using the syntax. Derick Rethans 11:48 However, PHP has a function for that already, which is called get_class() right? Nikita Popov 10:52 Exactly. This is essentially just syntax sugar for get_class(). The reason why we want to have the syntax sugar is really not so much that writing get_class() is particularly hard, but just that people expect it to be there. This class syntax looks a lot like a constant access, like a class constant access. So it looks like every class has a magic constant called class. Usually you are able to access class constants on objects. So you can write object, the double colon, and the constant thing. And that works. So in that case, we just take the class of the object and access it on that class. For consistency reasons, it only makes sense that you can do the same with this particular magic concept as well. There's really all the motivation Derick Rethans 11:43 Originally the class literal colon colon class is resolved at compile time. Of course, that can't happen on object colon colon class. Is that still true or no longer? Nikita Popov 11:53 So it really was true in the first place. For normal class names of cours is resolved at compile time. Actually one of the like gotchas with the syntax is that some people expected to validate that the class actually exists. So they expect that this gets auto loaded and they get an error if it doesn't exist, doesn't happen. So you can reference some non existing class with this syntax just fine. The usually your IDE is going to show a warning for that. I mean, as we just discussed, we also have a couple of magic class names. So we have self, parent, and static. The static one in particular, also always has to be resolved at runtime, because we don't know what the what class the method is actually going to be called on. Actually, self and parent also sometimes have to be resolved at runtime. And there are two cases where that can happen. One is if you use traits, because in that case self refers to the using class, not to the trait. So in closures the self class, refers to the bound scope. The bind to method, there is like the last argument on, is the scope you're using. So in those cases, it's already dynamically resolved. Derick Rethans 13:09 Okay. The RFC mentions one specific area where you can't use colon colon class. In which situation can you still not use colon colon class on objects? Nikita Popov 13:20 You can always use it on an object. I think what you're referring to is that normally, for normal class constants, you can also put the class name inside the string. I mean, put the string class name inside the variable and then access the constant on that variable. Derick Rethans 13:38 Oh, right. Yes. Nikita Popov 13:39 For the double colon class syntax, we don't want to allow that. Because, well, first this is kinda useless, because it will just return you back the same string you gave it. And I think in that case, the fact that the class name is not validated, this is especially confusing. Derick Rethans 14:00 Okay, that makes sense. So you can only call colon colon class on literal class names that you already could, as well as on variables that contain an object? Nikita Popov 14:09 That's right? Yeah. Derick Rethans 14:10 That sounds great. Does it show up differently in reflection? Nikita Popov 14:13 This magic class constant actually doesn't show up in reflection at all. It looks like a constant both it's really a special syntax that just happens to share the look with constants. Derick Rethans 14:24 Do you expect any controversies about this? Nikita Popov 14:27 I don't think so. Derick Rethans 14:28 I don't think so either. I can't really see anything that people could complain about too much. I think. I however, do think that for the next RFC that you came up with the variable syntax tweaks, there will be a little bit of haggling about whether this is good idea to do. In PHP seven, zero, we got this uniform variable syntax. Could you give a brief reminder of what it was about? Nikita Popov 14:48 That was about, well fixing a couple of syntax inconsistencies when it comes to variables syntax. So variable syntax in PHP is extremely, extremely magic. Like our expression, syntax nice and regular. But the variable syntax is a huge assortment of special rules and that RFC made those rules little bit less special at more regular. Derick Rethans 15:17 From what I understood we missed a few inconsistencies that we probably also should have addressed in that RFC. And that is what you know, trying to tweak again? Nikita Popov 15:24 All of these remaining consistencies are like really, really minor things and edge cases. But weirdly, all or at least most of them are something that someone at some point ran into, and either open the bug or wrote me an email or pinged me on Twitter. So people somehow managed to still run into these things. Derick Rethans 15:52 The RFC mentions four specific things that we've missed. What is the first one? Nikita Popov 15:57 Yeah, so it's probably going to be somewhat hard to talk about some of these examples. Derick Rethans 16:02 I know because I think some of them make no sense whatsoever. Nikita Popov 16:05 Yeah. Derick Rethans 16:06 Because how do you call a method on the string? Nikita Popov 16:07 Context for this one is I have a nice little extension called scalar objects, which allows you to more or less define methods on strings, on integers, on arrays and so on. In with the uniform variable syntax, we have allowed calling methods on string literals. That actually makes no real sense with baseline PHP. But if you're using scalar objects, then this is a useful feature because you can do something like take a string that rule and call length on it, while otherwise we'll have to wrap it in brackets. Derick Rethans 16:45 So it's just a syntax change pretty much. Nikita Popov 16:48 Well, what this one particular is about that right now, this works if it's a literal string, but if you have any variables inside it than suddenly stops to work, which is just a very. Derick Rethans 16:59 So it is the interpolated strings inside double quotes, the dollar variable name syntax. That's the problem that? Nikita Popov 17:05 Yeah. Derick Rethans 17:06 The second one is called constant derefenceability, which is a word I can't pronounce. And my text edit says it's not a word. So what do you mean by it? Nikita Popov 17:14 That's a good question. I think the term is more or less picked up from C, where we have pointers. And we can dereference pointers to access what the pointer points to. So that's the star operator in C. In PHP, we use the term dereference to also access some kind of structure in some way. For example, to access an array element, so array dereference, or to access an object properties, object reference and so on. That particular one is, I think two things. One is that you can, for example, access the first character of a constant. So read the constant name then brackets zero. Well, maybe even not the first time I can think of a better example. Um, you haven't the constant that contains an array, and you want to access a specific key on that area. That's something you can do already you right now. The same syntax does not work if the constant is in magic constant. What also doesn't work is if you use the our alternative array access syntax. So we have the square brackets, that's what people should use. And we have the curly braces, which is the alternative way to access arrays and which is actually deprecated as of 7.4. I'm not totally sure that that's going to be removed in PHP 8 or not. If it's going to be removed, then this part is a moot point. But yeah, this is again, I think, from a practical perspective, not really interesting. The only situation where I think this is useful is again, of course scalar objects, because it means you can call the methods on the constant Derick Rethans 18:57 Okay, which in syntax is the grammar currently disallowed doing that. Nikita Popov 19:00 Currently it's disallowed and that would allow it. Derick Rethans 19:03 A third one is related. I think it's a class constant dereferenceability. Nikita Popov 19:07 So someone complained about this one on Twitter. I don't know how they ended up trying to do this. Something you can do right now is, you can access a static property. And then you can interpret the content of that static property as a class name, and access another static property on that. So you can change chain these static property accesses. For some reason, the same does not work with class constant access. So static property accesses can be chained. But class constant accesses can't be. Again, for no particular reason, this change would allow that to happen as well. Derick Rethans 19:41 This is even a change that makes sense without having to use scalar objects. Nikita Popov 19:45 That one is. I wouldn't write that kind of code, but it logically makes sense. Derick Rethans 19:50 And then the last one is, in the RFC is called arbitrary expression support for new and instanceof. Nikita Popov 19:56 Yeah, so this is probably the only one that's actually useful for something. PHP has well, a bunch of places where usually you have to place either an identifier or a namespace name, but class name, method name, or a property name, and so on, or even the variable name. For all of these places, we usually support some kind of special syntax to instead use a general expression. For example, some of the variable with a static name, you can use curly braces to use a dynamic name instead. Derick Rethans 20:26 I think for new, we did it at some point already. Nikita Popov 20:29 For new, this doesn't exist yet. So you can use a variable as class name, but you can't actually compute the class name as part of the expression. Derick Rethans 20:40 I think what I was referring to, is you can use braces around the whole new class extension, so you can call methods. So that's that's what I meant, but this is specifically using an expression behind new. Nikita Popov 20:51 Yeah, so these are like two things. One is whether you use an expression for the new class name, and the others for the use the new itself as an expression. And yeah, the same, so yeah, right now, we don't support that for new. And we also don't support it for instanceof, so the the right hand side, which consists of that as the class new. The RFC just proposes to allow an expression and parenthesis in there. And this kind of stuff is, again, not well not particularly useful. But it is useful for things like code generation, where you may have to insert arbitrary expressions sets up your coins. And there are actually some nice hacks that you can use right now. So you can use a variable with a complex expression inside it, where you assign to the variable itself and then return its name. Derick Rethans 21:42 I don't think I understand this. You're saying you can construct a string with a complex expression in it. Nikita Popov 21:48 Not a string. You you write something like new variable, but with a curly brace syntax, and in there you return you start off with the string containing some kind of dummy variable name, and then you concatenate that with an empty string. But that empty string is computed by doing the assignments to the variable name that you're actually going to return. Derick Rethans 22:12 I still don't understand this. You know, what I'm going to do is I'm just going to link to a example for this in the show notes. Nikita Popov 22:19 It's not really important. You can just cut off this part. Derick Rethans 22:22 Yep sure, I can do that too-perfectly fine. Nikita Popov 22:24 Nice hack. Derick Rethans 22:24 But let's not teach too many hacks to people such I think. Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and we'll see what happens to them. Nikita Popov 22:37 Thanks for having me once again. Derick Rethans 22:41 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Static Return Type RFC: Class Name Literal on Object RFC: Variable Syntax Tweaks RFC: Uniform Variable Syntax RFC Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 39: Stringable Interface

February 06, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 39: Stringable Interface London, UK Thursday, February 6th 2020, 09:02 GMT In this episode of "PHP Internals News" I chat with Nicolas Grekas (Twitter, GitHub, LinkedIn, Symfony Connect) about the new "Stringable Interface" that Nicolas is proposing, as well as about voting rights (on RFCs). The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hello, this is Episode 39. Today I'm talking with Nicholas Grekas about an RFC that he's produced called stringable interface. I already spoke with Nicholas last year about the work that Symfony does the new PHP versions come out to look at deprecations and to make sure that versions of Symfony work with new versions of PHP. But this time Nicholas came up with his own RFC called the stringable interface. Nicholas, could you explain what streamable is? Nicolas Grekas 0:54 Hello, and Stringable is an interface that people could use to declare that they implement some the magic toString() method. Derick Rethans 1:02 Because currently there's not necessary to implement an interface, and PHP's internals will always use toString if it is available in a class, right? Nicolas Grekas 1:10 Yeah, absolutely. Derick Rethans 1:11 What is true reason why you would want to have a stringable interface. Nicolas Grekas 1:16 So the reason is to be able to benefit from union type in PHP 8. Right now, if you want to accept a string as an argument, it's pretty easy. You just add the string type, right? Let's say now you want to accept a string or a stringable object, stringable an object being something that implements this method. If you want to do that, you can not express the type using types today. Derick Rethans 1:42 Because if you choose string, and then the name of an object that would only do that specific object. Nicolas Grekas 1:47 Yes, there are some cases in Symfony especially because this is where work and I do open source. Where we do want to not call toString method until the very latest moment. after example is in the code: one is from Drupal. Drupal computes some constraint validation messages, lazyly, and it's pretty important to them because computing the message itself is pretty costly. They don't need to compute it all the time. Actually, we added the type, the string type in Symfony five, before it was released and Drupal came and say: Oh, this is breaking our code and our features, what should we do now? And we removed the type and we replaced it by some annotation saying: Okay, this is a string or a stringable object. So in the future, when will add up PHP 6 would like to be able to express that using a type of real one, Derick Rethans 2:41 PHP 6? Nicolas Grekas 2:42 No, PHP 8, that's true. Strings and PHP 6. Derick Rethans 2:49 Yay. Nicolas Grekas 2:51 Another example is also is pretty similar, actually. It's in the symfony auto wiring system. We have services that we wire and sometime we can not; the auto wiring logic is broken doesn't work because some class cannot be at a wet. So in this case, we have a lazy message, because sometime of service while it's not auto wireable, it's going to be removed later on because we removed, Symfony removes, unused services. So instead of computing ahead of time and error message that is heavy to compute, and that we might just trash because the service is going to be removed. We have this lazy thing because yeah, it's heavy to cook with that. So real world use cases. Derick Rethans 3:32 I think the intention by by having a stringable interface actually makes sense. What are the concerns for for adding this to your own code, are issues with backwards compatibility, for example? Nicolas Grekas 3:43 That's another goal of the RFC. The way I have designed it, is that I think the actual current code should be able to express the type right now, using annotations of course. So what I mean is that the interface, the proposal, the stringable is very easily polyfilled. So we just create this interface into global namespace, the declarative method, and done. So we can do that now. We can improve the typings now, and then in the future, we'll be able to turn that into an actual union type. Derick Rethans 4:16 You'd be able to do that almost immediately. Well, you would be able to do that in PHP 8. Nicolas Grekas 4:21 Yeah. Derick Rethans 4:21 Without it being a problem. And of course, in that case, you can remove to polyfilled stringable interface. Nicolas Grekas 4:27 Yeah, absolutely. Derick Rethans 4:28 This is going to impact extensions, as well, because extensions, I mean, PHP, internal functions, they often accept strings. I don't actually remember but if you use a scaler type hint string for an internal method than PHP or internal function, this is actually called a toString interface on objects. Like if you would call strlen() on an object that implements toString would actually call toString and return the length to that result. Nicolas Grekas 4:53 Yes, absolutely. Derick Rethans 4:54 So that wouldn't impact that specific case then. Nicolas Grekas 4:57 About extension because that's the current state of the implementation of extension, there was a discussion we're going to talk a bit later about, I think. The current state of the art say is that the interface declares the method that just run right, it declares the written type. It's colon string. So the declaration is public function "toString : string". The very first version didn't have the written type, because it's easier for backward compatibility. Because the current code doesn't need the written type. So by not adding it to the interface, we don't break backward compatibility, which is another critical lighting designer feature that I want at least to have. And so feedback came on the first pull request and said okay, we need the written type. So, the way I implemented that is that now in the RFC actually, the written type is implicit. toString, if you declare it, whether you type ": string" or not, it's there. If you do some reflection later on an instance of something that that then the reflection will tell: Yes, there is a written type and it's string, Derick Rethans 6:01 Whether you have defined it or not in your class. So that's a little bit of magic that gets added on. Nicolas Grekas 6:07 So it doesn't break any semantics because the written type is already in force: you cannot return anything else than the string right now. Derick Rethans 6:14 Yeah, that's true. So that means that automatically toString methods will in return type hints require string to be returned. Nicolas Grekas 6:21 Yes. Derick Rethans 6:22 And that tweak was necessary to make sure that an older backward compatibility was being broken. Nicolas Grekas 6:27 Yes Derick Rethans 6:28 Does that also extends to extension that no part that are not part of the PHP core distribution, do they need to be changed as well? Nicolas Grekas 6:35 So right now, in the current implementation, yes, they need to be changed. If they declare the toString method, they need to change the type basically, to declare that they return the string explicitly in the C code. So that the current state it's pretty easy on the implementation, implementation side to ask that to the extension authors, right? I think it is doable, but Nikita today posted proposal to improve and go to the next level of the RFC. And the next level would be to have the same magic for the declaration of the interface itself. So it would mean if you declare a toString method, then you implement the stringable interface without having to explicitly declare it in the class. Derick Rethans 7:22 I think that actually makes quite a bit of sense because that is pretty much how toString is used already. Anyway, the PHP engine enforces it has to be a string that's being returned. Nicolas Grekas 7:31 Yeah, that's very interesting in that would make the type as a typehint much more useful because any pre existing code would just work with the type and pass the type into the written type and so on. So that would be great. So the link with the extension is that maybe we should have the same automatic declaration implicit declaration applied to extensions. So then extension to boodle have to do to do anything and done. That would declare both the written type and the interface. Derick Rethans 8:03 That makes sense. You mentioned that Nikita just suggested something to tweak this RFC. I reckon this RFC is still open for discussion and voting hasn't started on it yet. Nicolas Grekas 8:13 Yes. Derick Rethans 8:13 Do you have any sort of idea for a timeframe where you think this will be finished? Nicolas Grekas 8:17 The earliest is on February 6, because we know we need to wait two weeks. So I opened that so we go. I don't know how to write the last part of what we discussed. So Nikita's suggestion. So I'm asking him to some help. As soon as it's ready. I think it can be open for voting. So it can be 10 days. So it didn't trigger much discussions on internals, which I don't know. Maybe it's a very, it's a good point. Or maybe it's like people will vote against without expressing why, I don't know. I hope it's a good thing. Derick Rethans 8:50 Sometimes people just start paying attention and there's a new vote. Nicolas Grekas 8:53 Yeah. Derick Rethans 8:54 So there wasn't a lot of controversy about stringable as you just said, but there was some controversy about you actually apply for voting rights, I remember what happened there? Nicolas Grekas 9:03 So yes, I applied for voting. Because of my implication, I think I'm an active PHP contributor to internals in not on not on the C-side, but Okay, so since I wanted to open this RFC, I said: Okay, now it's time to do the bureaucratic steps to get a vote, right? Derick Rethans 9:23 Yep. Nicolas Grekas 9:24 And I think I'm the first person to actually get through some process for getting votes in itself. I mean, I think most people or maybe all people that have a vote, a vote as a side effect of of something else. Derick Rethans 9:38 Yeah, usually about contributing patches, either PHP itself, documentation or extensions. Nicolas Grekas 9:43 So I think there's there has been some confusion, but it's been sorted out pretty quickly. I think I'm going to be able to vote on the next RFC. I'll report back if I can. Derick Rethans 9:54 Okay, fair enough. Currently, we don't really have a process for this at all. I mean, you get to vote when you have a GIT account. Pretty much, or a PHP commit access in some form. And I don't think we've ever really thought about handing that out to people that have been contributing a lot. Right. So that's kind of an interesting thing to see. What we have seen in the past, is people wanting just saying: Yeah, I'd like to vote, or in other cases, or yeah can I have a php.net email address, right. So that also happened because that is a side effect of getting commit access. Nicolas Grekas 10:23 Okay. Derick Rethans 10:24 At the moment I what happened when you did it, it got immediately shut down. Probably a bit quicker than was nice without any discussion. But I think in the future, we do need to come with, come up with a plan and perhaps even think about how to approach voting for features for RFCs the first place because we don't really have a set guideline on who gets to do this and who doesn't get to do it and stuff like that. Nicolas Grekas 10:49 Yeah, it's pretty interesting. Nikita just after the or during the discussion at, he posted some stats on the number of people who can vote and I think the number is like 1900 Derick Rethans 10:59 Yeah. There's quite a lot here. Nicolas Grekas 11:01 It's bit strange. And most people don't vote, I think, because they think they shouldn't. I don't know, something like that. But it's true. It's pretty strange. What I like about this situation is that it doesn't draw a strong line between people that contribute C code and people that write PHP code. And it's nice for PHP. I really think it's nice for PHP to have people that vote that don't do C code. But I think, of course, people that do C code must have the strongest voice, because at some point, the implementation decides. Derick Rethans 11:35 Well, that is a different right, the votes are usually on the idea, not on the implementation. But sometimes the implementation is so complicated that it's nearly impossible to implement, like, I've very briefly spoken with Nikita about generics. I'm sure we'll talk about that at some point, where I'm pretty sure that generics is an idea that simple, I mean, people will vote for it, but as an implementation it might not be that simple to do. Nicolas Grekas 12:01 Yeah. Derick Rethans 12:02 So what happens if you vote for the feature, but you can't come up with a good implementation? Nicolas Grekas 12:06 So I'm inside of thinking that people should vote on the implementations. I mean, people shouldn't be able to vote only on an idea. If there is an idea, it will be supported by an implementation that proves that we are talking about something real, no, just a fancy idea that might not work in black. So that's my opinion. Derick Rethans 12:24 That's a good point. But as you said, from the 1900 people, or or 1900 people plus, that's controlled, most of them are not familiar with a PHP internals whatsoever, because they tend to be contributions to the documentation. This is also very valuable, but it doesn't mean you know, and you don't necessarily know PHP internals, Nicolas Grekas 12:40 Yeah, sure. Derick Rethans 12:41 The oher way can be true as well right? You might know a lot about PHPs internals, but never really use PHP in real life, in your job, or anything like that. Nicolas Grekas 12:48 So it's also good to be able to team up with someone that knows how to code the C part, the internal part. So you have the idea you're you're the supporter part of the team and then someone - being able convince someone to do the implementation or to help you do it, is also proof of kind of interest. So starting small and bringing more people in the boat and making it happen as a thought. Derick Rethans 13:12 Yeah, and we saw some of that happening last year. I can't quite remember what feature it was or or exactly what it was. But I agree with you. I think that is important to do that you can at least somebody convinced to implement the feature before just voting on the idea. Thank you for taking the time with me this morning, Nicholas. Nicolas Grekas 13:30 And thank you Derick for having me again. Derick Rethans 13:32 It it continues like this I'm sure we'll speak again at some point in the future. Nicolas Grekas 13:35 Okay. Derick Rethans 13:39 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Stringable Interface Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 38: Preloading and WeakMaps

January 30, 2020 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 38: Preloading and WeakMaps London, UK Thursday, January 30th 2020, 09:01 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about PHP 7.4 preloading mishaps, and his WeakMaps RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: WeakMaps Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weeklish podcast dedicated to demystifying the development of the PHP language. This is Episode 38. I'm talking with Nikita Popov about a few things that have happened over the holidays. Nikita, How were your holidays? Nikita Popov 0:34 My holidays days were great. Derick Rethans 0:36 I thought I'd start with something else then I did last year. In any case, and wanting to talk to you this morning about something that happens to PHP seven four over the holidays. And that is issues with preloading on Windows with PHP seven four. I have no idea what the problem is here. Would you try to explain this to me? Nikita Popov 0:56 So there were actually quite a few issues with preloading in early PHP 7.4 releases. The feature definitely did not get enough testing. Most of the issues have been fixed in 7.4.2. But if you're using preload-user, what you have to use if you're running on the root, then you will probably still see crashes and that's going to be fixed in the next release. Derick Rethans 1:20 In 7.4.3. Nikita Popov 1:22 Right. But to get back to Windows, Windows has a well very different process architecture than Linux. In particular, on Linux, or BSD we have fork. Which basically just takes a process and copies its entire memory state to create a new process. This is a lot cheaper than it sounds because it's all like reuses memory until it's actually changed. Derick Rethans 1:48 Its copy on write. Nikita Popov 1:49 Copy on write exactly. The same functionality does not exist on Windows, or at least it's not publicly exposed. So on Windows, you can only create new processes from scratch, that look, we use our memory from the previous one. And for OPcache, this is a problem because OPcache would really like to reference internal classes as defined by PHP. But because we store things in shared memory, which is shared between multiple processes, we now have the problem that these internal classes can reside at different addresses, in these different processes. On Linux, it's always going to be the same address because we are forking and that keeps the address. On Windows each process could have a different address. And especially because Windows since I think Windows Vista, uses address space layout randomization. This is actually pretty much always going to be a different address. Derick Rethans 2:51 Because that's a security feature? Nikita Popov 2:52 Exactly. It's a security feature. Derick Rethans 2:54 Would it also be a problem on Linux if you'd start a process instead of forking it? Nikita Popov 2:59 Yes, it would be a problem. The difference's just that on on Unix, we don't do that. OPcache has quite a different architecture on Windows. On Linux, we do not allow to attach to an existing OPcache from a separate process. So the only way to share an OPcache is to use fork. On Windows because of this restriction that we don't have fork, we do though this kind of attachments and that's where we have we have to deal with these kind of issues. So that's actually a general problem, not just for preloading on differences, just that normally, we can just: Hey, okay, we do not allow any references to internal classes from shared memory on Windows. It's like a slight hit to optimization, but it's not super important. While we're preloading, we have to link the entire class graph during preloading. And if you have any classes that for example, extend from an internal class, like extend from Exception. Or in some cases, you can just use an internal class as a type hint, then we would not be able to store these kinds of references in shared memory on Windows. And because for preloading, it's pretty much inevitable that you run into the situation you just can't realistically do preloading on Windows, Derick Rethans 4:18 Hence, the decision being made just turning it off, instead of trying to end and always failing pretty much. Nikita Popov 4:24 Yeah, I mean, it kind of did work before, it just got a bunch of warnings that these classes haven't been preloaded. And if people try that, oh, it's like with a simple example there, we'll see you great, preloading is working. But once they move to their actual complex application that uses internal classes at various points, it turns out that: Actually, no, it doesn't really work in practice. And so the way that we just disabled entirely Derick Rethans 4:51 That seems like a reasonable solution to this, do you think at some point this can be fixable in another clever way? Nikita Popov 4:58 Well, main way in which can be fixed is to avoid this kind of multi process attachments on Windows. The alternative to having multiple processes is to have multiple threads, which do share an address space. Basically same as fork just with threads then. But that, of course, depends on what kind of web server you're using and what kind of SAPI you're using. And I think nowadays, on Windows on threaded web servers are somewhat more popular than on Linux, it's still not the majority deployment strategy. Derick Rethans 5:34 I think it used to be that threaded process models on Windows were a lot more common when PHP just came out for Windows, because it was an ISAPI module which was always threaded. From what I remember the original reason why we had ZTS, in the first place. Yeah, at some point that started moving to PHP FPM kind of models because it didn't use threading and it was, tended to be a lot safer to use it that way. Nikita Popov 5:57 Right. I mean, threading has issues in particular because things like locales are per process, not per thread. So processes are usually safer to use Derick Rethans 6:08 Anything else interesting that happened that went wrong with a preloading, or do you not want to mention? Nikita Popov 6:12 The rest is mostly just that we have two different ways of doing preloading. One is using OPcache compile file, and others using require or include, and the difference between them is that OPcache compile file combines the file but does not executed. In that case, the way we perform preloading is that we first collect all classes and then we, like gradually, link them, actually register them, always making sure that all the dependencies have already be linked. And this is the mode that that I think mostly work well at the release of PHP seven point four. And the other one, they require approach is where we, well require directly executes the code and registers the classes. And in that case, basically, if it turns out that some kind of dependency cannot be preloaded for some reason, we simply have to abort preloading, because we cannot recover from that. This abortion was missing. And it that turns out that, in the end, the way people actually use preloading is using the require approach, not using the OPcache compile file approach. Derick Rethans 7:26 Although that's the one you see most of the examples that I've seen, and in the documentation. Nikita Popov 7:30 Right, it has some advantages you some require. Derick Rethans 7:34 Something else that happened over the holidays is that you've worked on several RFCs there're too many to talk about at all in this episode. But one of the earlier ones, was a WeakMap, or WeakMaps RFC, which sort of builds on top of the weak references that we already got in PHP seven four. What's wrong with the weak references, and why do we now need weak maps? Nikita Popov 7:58 There's nothing wrong with weak references. As a reminder what weak references are both, they allow you to reference an object without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference. And if you try to access it, you get back knowledge of the object. Now, the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with them. Typical use cases are caches or other memoise data structures. And the reason why it's important for this to be weak is that you do not well, if you want to cache some data with the object, and then nobody else is using that object. You don't really want to keep around that cache data because no one has ever going to use it again. And it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well. Derick Rethans 9:16 So you mentioned objects as keys. Is that something new? Because I don't think currently PHP supports that. Nikita Popov 9:22 I mean, you can't use objects as keys in normal arrays. That doesn't work. For example, the array access interface and the traversable interface, they don't really care what your types are. So you can use anything. Derick Rethans 9:37 I glanced over that that point, yes. But weak map is something that then implements array access. Nikita Popov 9:44 That's right Derick Rethans 9:45 How does the interface of a weak map look like? How would you interact with it? Nikita Popov 9:49 Yeah, actually, it just implements all the magic interfaces in PHP. So ArrayAccess, you can access the roadmap by key, where the key's object. Traversable, that is you can iterate over the weak map and get both the keys and values, and of course Countable, so you can count how many elements there are in there. And that's it. Derick Rethans 10:12 All the methods, there's plenty of em then, there should be nine or 10 or so right? Nikita Popov 10:17 Five. Derick Rethans 10:18 No there's the six of iterator. Nikita Popov 10:20 Right, yeah, there is this little detail where when you implement Traversable, internal classes, you don't actually have to implement iterator methods. That's why there is a few, a few less. Derick Rethans 10:33 Who's going to benefit from this new feature? Nikita Popov 10:35 Like one of the users for weak maps are things like ORMs. Where, well, database records are represented as object, and there is data storage related to these objects. And I think it's a, well, well known issue that if you're using ORMs you can sometimes run into Memory Usage issues. And the absence of weak structures is one of the reasons why that can happen. So that they just keep holding onto information even though the application actually doesn't use it anymore. Derick Rethans 11:12 Did a specific ORM request this feature? Nikita Popov 11:15 I don't think so. Derick Rethans 11:16 Because weak maps are something done as an internal class in PHP, how are these things implemented? Is there something interesting because I remember talking to Joe about weak references last year, there is some functionality where it would automatically do something on the destructor or rather of the objects. Is this something that also happens with weak maps. Nikita Popov 11:37 So yeah, the mechanism how weak references and maps work is basically the same. So there is a flag on each object, that can be set to indicate that it has a weak reference or weak map. If the object is destroyed, and has this nice flag, then we execute a callbeck that is going to remove the object from the Weak Reference or from the weak map, or from multiple maps. Derick Rethans 12:05 Is it because there are some kind of registry that links an object? Nikita Popov 12:08 So when we store all the weak references, weak maps, and the object as part of, so we can efficiently remove it. Derick Rethans 12:16 When I was reading the RFC, I saw something like SPL object ID mentioned, which is a way how to basically identify a specific object. Is this something related to weak references or weak maps? Or is this something else no longer used, or people should no longer use pretty much, because I guess this was a way previously how to identify an object and then associated extra data with it. Like you mentioned that ORMs were due for cache. Nikita Popov 12:44 Right. So it's kind of related, but I'm also not. So one is not a replacement for the other, just different use cases. We used to have SPL object hash for a very long time. And I think, somebody went around PHP 7.0, or maybe later SPL object ID was introduced, which this the same just because an integer and because because of that is more efficient. But in the end, what these functions do is return a unique identifier for an object. But this identifier is only unique as long as the object is alive. So these object IDs are reused when objects are destroyed. Derick Rethans 13:30 And that makes them not usable for associating cache data with a specific object? Nikita Popov 13:35 That makes them usable for associating cache data. But you also have to store the object to make sure it does not get destroyed in the meantime. So that's how you get around the restriction that you cannot use objects as array keys. That's what you need the ID for. But you still have to store the like a strong reference to the object to make sure it's not garbage collected. And this ID starts referencing some kind of other objects. Derick Rethans 14:04 When you say Strong Reference, that is what PHP references are traditionally? Nikita Popov 14:08 That's the normal reference. Derick Rethans 14:10 Well, because it's been quite some time since it's got introduced from what I understood this has been accepted? Nikita Popov 14:16 It is accepted: 25, zero Derick Rethans 14:18 25, zero. That doesn't happen very often. Nikita Popov 14:22 Most RFCs are maybe not anonymous, but usually either they are 95% accepted, or they rejected really hard. There is not a lot of middle ground. Derick Rethans 14:34 That's pretty good, though. In any case, we will see this in PHP 8, I suppose, coming out later in the year. Nikita Popov 14:39 That's right. Yes. Derick Rethans 14:41 Well, thank you for taking the time today to talk to me about weak references and preloading especially on Windows. Thank you for taking the time. Nikita Popov 14:50 Thanks for having me Derick Derick Rethans 14:52 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 37: PHP 7.4 Celebrations!

November 28, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 37: PHP 7.4 Celebrations! London, UK Thursday, November 28th 2019, 09:37 GMT In this episode of "PHP Internals News" we are celebrating the new features that are part of this release. Instead of talking with a single guest about an RFC or feature, instead, I have asked followers of the @PHPIntNews Twitter account to record a snippet to talk about their own favourite PHP 7.4 features. With thanks to Benjamin Eberlei, George Banyard, James Titcumb, Mark Randall, Matthew Setter, Nikita Popov, Vincent Dechenaux, and William Pinaud. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: Typed Properties RFC: Foreign Function Interface Episode 2: PHP Compiler and FFI RFC: Preloading RFC: Allow throwing exceptions from __toString() Episode 14: __toString exceptions RFC: Spread Operator in Array Expression Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 36: What didn’t make it into PHP 7.4?

November 21, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 36: What didn’t make it into PHP 7.4? London, UK Thursday, November 21st 2019, 09:36 GMT In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 7.4, but did not end up making the cut. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes Episode 1: Saner string to number comparisons Episode 5: Comprehensions Episodes 8 and 23: Deprecating Short Open Tags Episode 10: LSP and Operator Precedence Episode 15: base_convert() Improvements Episode 18: Strict Operator Directive Episode 21: str_starts_with() and friends Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 35: Cryptography

November 07, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 35: Cryptography London, UK Thursday, November 7th 2019, 09:35 GMT In this episode of "PHP Internals News" I chat with Scott Arciszewski (Website, Twitter, GitHub, Patreon) about the recent PHP-FPM vulnerability and the state of cryptography in PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes PHP-FPM bug PHP Bug Report CVE 2019-11043 Exploit Code ECB Penguin Padding Oracle Attack Constant Time Encoding with ext/sodium: sodium_base642bin() Ciphersweet Supersingular isogeny key exchange Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 34: Deprecate Backtick Operator

October 31, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 34: Deprecate Backtick Operator London, UK Thursday, October 31st 2019, 09:34 GMT In this episode of "PHP Internals News" I chat with Mark Randall (GitHub) about an RFC that he proposed that would deprecate the backtick operator. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: Deprecate Backtick Operator Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 33: Union Types

October 24, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 33: Union Types London, UK Thursday, October 24th 2019, 09:33 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about an RFC that he created to add union types to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: Union Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 32: Writing Extensions

October 17, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 32: Writing Extensions London, UK Thursday, October 17th 2019, 09:32 BST In this episode of "PHP Internals News" I chat with James Titcumb (Twitter, GitHub, Website, LinkedIn) about writing PHP extensions commercially. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes PHP Internals Book PHP at the Core: A Hacker's Guide PECL Development List archive Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 31: DOM Living Standard API

October 10, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 31: DOM Living Standard API London, UK Thursday, October 10th 2019, 09:31 BST In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about an RFC that he produced that would implement the new "DOM Living Standard API". The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: DOM Living Standard API Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 30: Object Initializer

October 03, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 30: Object Initializer London, UK Thursday, October 3rd 2019, 09:30 BST In this episode of "PHP Internals News" I chat with Michał Brzuchalski (Twitter, GitHub, Website) about an RFC that he produced that would add a new "Object Initializer" syntax. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: Object Initializer Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0

PHP Internals News: Episode 29: Reclassifying Engine Warnings

September 26, 2019 00:00 0.0 MB Downloads: 0

PHP Internals News: Episode 29: Reclassifying Engine Warnings London, UK Thursday, September 26th 2019, 09:29 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub) about adding information about arguments and return types to PHP's reflection mechanism. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: Reclassifying engine warnings Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0