32 MiB Working Sets on a 64 GiB machine

Marek Knápek@programming.dev · 30 days ago

No! One code point could be encoded by up to 4 UTF-8 code units, not glyph. Glyphs do not map to code points one to one. One glyph could be encoded by more than one code point (and each code point could be encoded by more than one code unit). Code points are Unicode thing, code units are Unicode encoding thing, glyphs are font+Unicode thing. For example the glyph á might be single code point or two code points. Single code point because this is common letter in some languages, and was used in computers before Unicode was invented, two code points because this might be the base letter a followed by an diacritic combining mark. Not all diacritic letters have single code point variant. Also emojis, they are single glyph but multiple code points, for example skin tone modifier for various faces emojis, or male+female characters combined into single glyph forming a family glyph. Also country flags are single glyph, but multiple code points. Unicode is BIG, there are A LOT of stuff in it. For example sorting based on users language, conversion to upper/lower case is also not trivial (google the turkish i).

Marek Knápek@programming.dev · 1 year ago

You can always take a look how for example Windows 3.11 and earlier did it for their *.rtf file format and their “write.exe” editor / viewer / renderer (if you want to call it that way).

Marek Knápek@programming.dev · 2 years ago

Doesn’t depend on programming language but something with visual debugger. You know that stuff when you can see current line of your source code highlighted, press a key to step into, step over and so on. You can see values inside your variables. You can also change your variables mid-run right form the debugger.

Because you spend 20% of your time writing bugs and the other 80% debugging them. At least make it pleasant experience (no printf-style debugging).

Back in the day I was using Turbo Pascal, Delphi, Visual Basic, C#, Java, PHP with Zend, Java Script, today I’m using Visual C++.

Marek Knápek@programming.dev · 2 years ago

32 MiB Working Sets on a 64 GiB machine

Marek Knápek@programming.dev · 2 years ago

Yes, I know this. It took me long time to figure this out. My entire life I focused on technical skills / programming / math / logic. As I deemed them most important for the job. I was like: “Hey, if you cannot program, why do you work as programmer (you stupido)?” Only few years ago I realized that even as programmer (as opposed to sales man) you really need those “meh” soft skills. And that they are really important and I should not call them “meh”. I’m very good at solving problems, improving product’s performance, memory consumption, discovering and fixings bugs, security vulnerabilities. But I’m very very bad at communicating my skills and communicating with people in general. I’m not able to politely tell people that theirs idea is bad, I just say “that’s stupid”. And I’m mostly/sometimes right (if I’m not 100% sure, I don’t say anything), but the damage caused by the way I say it is often inreversible. That post of mine about the job interview and CV was half joke and half reality. I just freeze/stutter when I’m asked something that is obvious because it is written I my CV. I’m immediately thinking “Did he not received the CV?” or “Did he not read it?” “Why the fuck is he not prepared for the call? Why are we wasting time asking me what should be obvious because I sent it in advance?” I’m more robot than human. Put me in front of problem and forget to tell me that it is impossible to solve … and I will solve it. But easy small talk … disaster. Communicating what the problem really was … disaster. Communicating how I solved it … disaster. “It was not working before and now it works fine, what the hell do you want from me now?” Yes, I’m very bad in team, in collective. I didn’t know the reason why, but since few years ago I know the root of the problem. It’s not that everybody around me is stupid and don’t know basic stuff (what I consider basic), but me unable to communicate with other humans.

Marek Knápek@programming.dev · 2 years ago

The interview starts … the interviewer asks me “Tell me about yourself.” … I respond “Did you receive my CV? I put all important details about me … right there. What questions do you have about my past jobs?” The interviewer encourages me again to tell him about myself, my past projects, etc. … Me: Awkward silence. … Me to myself: Dafuq? Should I read the CV from top to bottom OR WHAT?

Marek Knápek@programming.dev · 2 years ago

Yes, but (there is always a but) it does not apply if you implement off-line encryption. Meaning no on-line service encrypting / decrypting attacker provided data (such as SSL / TLS / HTTPS). Meaning if you are running the cipher on your own computer with your own keys / plaintexts / ciphertexts. There is nobody to snoop time differences or power usage differences when using different key / different ciphertext. Then I would suggest this is fine. The only one who can attack you is yourself. In fact, I implemented AES from scratch in C89 language, this source code is at the same time compatible with C++14 constexpr evaluation mode. I also implemented the Serpent cipher, Serpent was an AES candidate back then when there were no AES and Rijndael was not AES yet. The code is on my GitHub page.

Marek Knápek@programming.dev · 2 years ago

The Little Things: The Missing Performance in std::vector

Marek Knápek@programming.dev · 2 years ago

std::vector::reserve + std::vector::push_back in loop is sub-optimal, because push_back needs to check for re-allocation, but that never comes.

std::vector::resize + std::vector::operator[] in loop is also sub-optimal, because resize default-initializes all elements only to be overwritten soon anyway.

This article’s author suggests push_back_unchecked.

I suggest std::vector::insert with pair of random access iterators with custom dereference operator that does the “transform element” or “generate element” functionality. The standard will have resize_and_overwrite hopefully soon.

Moar discussion:

https://codingnest.com/the-little-things-the-missing-performance-in-std-vector/

https://twitter.com/horenmar_ctu/status/1695823724673466532

https://twitter.com/horenmar_ctu/status/1695331079165489161

https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/

https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/jy21hgd/

https://twitter.com/basit_ayantunde/status/1644895468399337473

https://twitter.com/MarekKnapek/status/1645272474517422081

https://www.reddit.com/r/cpp/comments/cno9ep/improving_stdvector/

Marek Knápek@programming.dev · 2 years ago

The Little Things: The Missing Performance in std::vector

Marek Knápek@programming.dev · 2 years ago

Another alternative to C++ exceptions (instead of return code) is to use global (or thread local) variable. This is exactly what errno that C and POSIX are using or GetLastError what Windows is using. Of course, this has its own pros and cons.

Marek Knápek@programming.dev · 2 years ago

For this purpose I would search for Linux specific thing - a RAM based storage that receives data writes and even flushes/fsycs and then lazily writes them to HDD/SSD. There would be problem in case of power loss, but the gained performance… Unfortunately, I don’t know any such tool.

Marek Knápek@programming.dev · 2 years ago

ABI break when?

I know some unnamed big customers want ABI stability. But common … VS2015, VS2017, VS2019 and VS2022 all compatible with each other if used with new enough linker? They all are sharing pre-defined macro _MSC_VER 19xx and VC++ toolset version number 14.xx. That is too much of holding back progress on performance and correctness fronts. Eight years is enough.

Customers need to learn that they cannot rely on ABI stability of STL provided classes, cos guess what: The Holy Standard doesn’t specify any. Toolchain vendors do. This also applies to MFC/ATL/whatnot distributed as part of Visual Studio. Remember the GCC copy-on-write string ABI problem? We already have technology to help migrate between ABI versions: one is called COM, other is pimpl, other is version number as first member of struct or first function parameter. I bet there are many more out there.

Marek Knápek@programming.dev · 2 years ago

https://twitter.com/visualc/status/1688976075584376845

Marek Knápek@programming.dev · 2 years ago

What's New for C++ Developers in Visual Studio 2022 17.7 - C++ Team Blog

Marek Knápek@programming.dev · 2 years ago

Makes sense, how would you represent floor(1e42) or ceil(1e120) as integer? It would not fit into 32bit (unsigned) or 31bit (signed) integer. Not even into 64bit integer.

Marek Knápek@programming.dev · 2 years ago

Think of advanced features of WinRAR not being accessible without valid licence key. Ehm, WinRAR distributes the same binary for both licensed and unlicensed users, unlocking the features with license key or with a crack (equivalent of NOPing the if). What if instead WinRAR distributed different binary for each licensed user, advanced features encrypted by per-user key. Crack or keygen would need to use some particular user’s binary with theirs license. Easily trackable. Or crack would need need to be applied once and then distribute the un-encrypted features / code.

Marek Knápek@programming.dev · 2 years ago

Think of password protected access to something, anything. Instead of checking if(password == some_constant){...} or if(hash(password) == precomputed_hash){...} you encrypt that something. The first variant has disadvantage that some_constant is stored inside your binary, thus password visible to anybody. The second variant … the same, hash of the password is stored inside the binary and could be brute-forced or rainbow-tabled. Both variants have the disadvantage, that there is a run-time check refusing access to some data, but those data are available in the binary anyway. Just open the program in debugger or in hex editor and NOP the if out. With my approach the data is unreadable without the correct password. The app could not be convinced / persuaded to provide the data in any way without the password.

Marek Knápek@programming.dev · 2 years ago

C++ AES-256 encryption at compile-time

Marek Knápek@programming.dev · 2 years ago

AES encryption in C from scratch on web.

Marek Knápek@programming.dev · 2 years ago

This is true in real world also, not only in computer world or programming world. Think of steam engine, it enabled sooo much progress in other fields. Also invention of lathe and precision measuring and engineering. Before that, invention of “simple machines” such as pulley, bolt/screw, windlass, lever. The same in math, physics, chemistry and so on.

Marek Knápek@programming.dev · 2 years ago

Oversimplified:

You have your current OS, text editor, compiler.
You write code of the new improved OS using your current OS, text editor.
You compile the code (text file), compilation yields the new OS or the new kernel (binary file).
You replace (overwrite) your current kernel by the new kernel (current OS by new OS). This is possible, because while the OS is running it is in RAM not touching the disk.
You restart.
BIOS loads the new OS from disk to RAM and executes it.
tada.wav

More questions:

How to update BIOS? Answer: The same.
How the first OS, text editor and compiler were created? Answer: The same. Using more primitive OS, text editor and compiler each step into the past. At the beginning there were toggle switches, punch cards, punch ribbon strips or similar.

The same style of question would be: How to create a hammer if in order to create a hammer you need a hammer? How was the first hammer created? Answer: By more primitive hammer, or something that is no hammer, but almost works as a hammer.

For more info read about bootstrapping compilers. Or trusting the trust by Ken Thompson.

Marek Knápek@programming.dev · 2 years ago

So what happened:

Someone posted a post.
The post contained some instruction to display custom emoji.
So far so good.
There is a bug in JavaScript (TypeScript) that runs on client’s machine (arbitrary code execution?).
The attacker leveraged the bug to grab victim’s JWT (cookie) when the victim visited the page with that post.
The attacker used the grabbed JWTs to log-in as victim (some of them were admins) and do bad stuff on the server.

Am I right?

I’m old-school developer/programmer and it seems that web is peace of sheet. Basic security stuff violated:

User provided content (post using custom emojis) caused havoc when processing (doesn’t matter if on server or on client). This is lack of sanitization of user-provided-data.
JavaScript (TypeScript) has access to cookies (and thus JWT). This should be handled by web browser, not JS. In case of log-in, in HTTPS POST request and in case of response of successful log-in, in HTTPS POST response. Then, in case of requesting web page, again, it should be handled in HTTPS GET request. This is lack of using least permissions as possible, JS should not have access to cookies.
How the attacker got those JWTs? JavaScript sent them to him? Web browser sent them to him when requesting resources form his server? This is lack of site isolation, one web page should not have access to other domains, requesting data form them or sending data to them.
The attacker logged-in as admin and caused havoc. Again, this should not be possible, admins should have normal level of access to the site, exactly the same as normal users do. Then, if they want to administer something, they should log-in using separate username + password into separate log-in form and display completely different web page, not allowing them to do the actions normal users can do. You know, separate UI/applications for users and for admins.

Am I right? Correct me if I’m wrong.

Again, web is peace of sheet. This would never happen in desktop/server application. Any of the bullet points above would prevent this from happening. Even if the previous bullet point failed to do its job. Am I too naïve? Maybe.

Marek.

Marek Knápek@programming.dev · 2 years ago

Cryptographic hash functions calculator on-line.

Marek Knápek@programming.dev · 2 years ago

Cryptographic hash functions calculator on-line.

Marek Knápek@programming.dev · 2 years ago

Cryptographic hash functions calculator on-line.

Marek Knápek