Want to create interactive content? It’s easy in Genially!
modulo 6 Encoding
Tirocinante Consorzi
Created on September 14, 2023
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Corporate Christmas Presentation
View
Business Results Presentation
View
Meeting Plan Presentation
View
Customer Service Manual
View
Business vision deck
View
Economic Presentation
View
Tech Presentation Mobile
Transcript
preface (1/2)
Encoding in IT
Welcome to the module on Encoding in IT, an essential guide to understand the concept of encoding and its various applications. This document aims to provide a comprehensive introduction to encoding, covering topics such as ASCII encoding, URL encoding, HTML encoding, base64 encoding, hexadecimal encoding, and the use of encoding in cyber attacks.
preface (2/2)
At the end of this module, you have reached the following goals
- You know the meaning of encoding and you know the purpose of encoding
- You know the ASCII encoding scheme and you can use it
- You know URL encoding en you can interpret it
- You know the different kinds of HTML encoding and can use it
- You know how base64 encoding is working and you can decode it
- You know the working and purpose of hexadecimal encoding
- You know where encoding is used in cyber attacks
What is encoding? (1/2)
Worldwide we see different forms of the amount five
- decimal numerical system: 5
- roman numeral system: V
- hieroglyph writing: |||||
- with 🖐️
- ….
What is encoding? (2/2)
Encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage. Decoding is the opposite process, it is the conversion of an encoded format back into the original sequence of characters.
The need for encoding
When we send over data, we cannot be sure that the data would be interpreted in the same format as we intended it to be. So, we send over data coded in some format that both parties understand. It is important that developed encoding schemes are accurate. The encoded data should have the same content then the decoded data. Encoding itself is NOT A SECURITY solution
History of encoding: Morse code (1/2)
In history, a lot of encoding schemes have been used. An example is Morse code
- sequences of two signal durations, called dits and dashes
- used in telegraphy
- international Morse code encodes the 26 basic Latin letters A through Z
- there is no distinction between upper and lower case letters
History of encoding: Morse code (2/2)
Where is encoding used? (1/2)
Encoding can be used:
- to convert information to the appropriate form for transmission.
- in data storage and data processing
- in data compression and decompression
Where is encoding used? (2/2)
In the picture you see two different representations of the same data. The first one has no parsing characters, the second one has parsing characters. Depending on the technology will one representation or the other be better to use.
What is ASCII encoding?
Ascii encoding
- stands for American Standard Code for Information Interchange
- a type of code that is used for converting characters into a code
- used in computers, telecommunications equipment and other devices
- each computer manufacturer represented alphabets, numerals, and other characters in its own way.
- different models of computers could not communicate with each other
Original ASCII Table (1/2)
The original ASCII code
- based on the (modern) English alphabet.
- 128 specified characters into seven-bit integers
- 95 of them are printable
- 33 non-printing control codes which originated with Teletype machines
Original ASCII Table (2/2)
Extended ASCII tables and codepages
ASCII is created for the english alphabet
- What about other alphabets?
- ASCII extended from 128 characters to 255 characters
- Different regions of the world chose to use this extra space differently
- Codepages were born
Unicode (1/2)
Different codepages ?But every device should be able to display the same information! So, Unicode was born!
- is an effort to include all characters from all currently and historically used human languages into single character enumeration
- is effectively one large single code page
Unicode (2/2)
These days, the Unicode standard defines values for over 128,000 characters and can be seen at the Unicode Consortium. It has several character encoding forms:
- UTF-8: widely used in email systems and on the internet
- UTF-16: used by systems such as the Microsoft Windows API, the Java programming language and JavaScript
- UTF-32: is capable of representing every Unicode character as one number, is huge and almost never used
URI and URL
URI
- identifies a resource and differentiates it from others by using a name, location, or both
- identifies the web address or location of a unique resource.
URI (1/7)
The URI generic syntax consists of components organized hierarchically in order of decreasing significance from left to right.
URI (2/7)
The first component “scheme” is obligated, it defines the addressing system. It can contain any combination of letters, digits, plus signs, periods, or hyphens followed by a colon. The most common URI schemes include HTTP, HTTPS, FTP, mailto, and file.
URI (3/7)
The authority component is an optional component preceded by a double slash and terminated by a slash, a question mark, or a hash symbol. It consists of three sub-components:
- Userinfo
- Host
- Port
URI (4/7)
The path component contains a sequence of data segments that describes the location of a resource in a directory structure. It should be empty or separated by a slash. For example, telnet://192.0.2.16:80/ is a valid URI with an empty path since there’s no indication of the specific resource location.
URI (5/7)
The query component is an optional component that contains a query string of non-hierarchical data. It is often a string of key=value pairs. This component is preceded by a question mark. For example, if the URI is https://example.org/test/test1?search=test-question#part2, the query component is search=test-question.
URI (6/7)
The fragment component includes a fragment identifier that provides the direction to a secondary resource. It refers to a different section of the primary resource. A fragment is preceded by a hash symbol and terminated by the end of a URI. For instance, the fragment component from https://example.org/test/test1?search=test-question#part2 is part2.
URI examples
mailto://mailboxX.com:6267/complaints/there?name=help#nose.This URI contains a scheme name, an authority with host and port, a path,a query and a fragment. telnet://192.0.2.16:80/. In this example, “telnet” is the scheme name and the numbers (IP address) after the double slash make up the authority. The path is empty, which is why no characters come after the slash.
URL (1/2)
URL
- abbreviation of Uniform Resource Locator
- is a specific type of URI
- Does not only identify the resource but tells you how to access it or where it’s located.
URL (1/2)
URL
- abbreviation of Uniform Resource Locator
- is a specific type of URI
- Does not only identify the resource but tells you how to access it or where it’s located.
URL (2/2)
Each URL should follow the URI syntax that has a similar structure to a URI. Below is an example of URL syntax: https://www.example.com/forum/questions/?tag=networking&order=newest#top
The need for URL encoding
- URL is composed out ASCII characters
- Some ASCII characters are not allowed to be placed directly within URLs (backspace, tab,..)
- Some characters have a special meaning within URLs (?, /, #,...)
- Unsafe characters are also not allowed to be placed directly within URLs (“”, <>,...)
URL encoding (1/2)
- converts reserved, unsafe, and non-ASCII characters in URLs to a format that is universally accepted and understood by all web browsers and servers.
- It first converts the character to one or more bytes.
- Then each byte is represented by two hexadecimal digits preceded by a percent sign (%).
- The percent sign is used as an escape character.
- URL encoding is also called percent encoding since it uses percent sign (%) as an escape character.
URL encoding (2/2)
Example
- ASCII value of space character in decimal is 32
- converted to hex comes out to be 20
- we just precede the hexadecimal representation with a percent sign (%)
- this gives us the URL encoded value - %20
HTML encoding
- The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser.
- There are various characters that are part of the HTML markup itself (such as < and >).
- To use these within the document as content you need to HTML encode them by using HTML character codes.
HTML numeric character reference
- A first way for character encoding in HTML is to make use of numeric character references.
- A numeric character reference in HTML refers to a character by its Unicode code point, and uses the format &#nnnn or &#xhhhh where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form.
HTML character entity references
- A second way to use encoding in HTML is by referring to a character by the name of an entity which has the desired character as its replacement text.
- It has the format &name; where name is a case-sensitive alphanumeric string.
Base 64 encoding (1/2)
- Base64 is used to encode binary data as printable text.
- This allows you to transport binary over protocols or mediums that cannot handle binary data formats and require simple text.
- Base64 is a group of binary-to-text encoding schemes that represent binary data (a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit base64 digits.
Base 64 encoding (2/2)
The encoding process follows the next steps:
- The base64 encoding algorithm receives an input stream of bytes (8 bits)
- It processes the input from left to right and divides the input into 24-bit groups by concatenating three 8-bit bytes.
- These 24-bit groups are then treated as 4 concatenated 6-bit groups.
- Finally, each 6-bit group is converted to a single character using the base64 table.
Example of base64 table
Base64 padding
What is padding? In the process of base64 encoding, there will be some cases in which the last group (of 24-bits) doesn't have enough bits, then there are 2 cases:
- If the group has only 8 bits of input data, we pad 16 bits of zero’s. The last 2 characters will be overridden with 2 equal signs (==)
- If the group has only 16 bits of input data, we pad 8 bits of zero’s. The last 1 character will be overridden with 1 equal sign (=)
Hexadecimal encoding
- Hexadecimal encoding is also called base16 encoding.
- It uses 16 distinct symbols.
- The hexadecimal symbols 0 till 9 are used to represent decimal values from 0 to 9
- The hexadecimal symbols A to F (case insensitive) are used to represent the decimal values from 10 to 15
Hexadecimal notation table
Misuse of URL encoding - Path Traversal (1/3)
Path traversal is an attack
- that exploits weak access control implementations on the server side, particularly for file access
- an attacker would try to access restricted files by injecting invalid input into the website.
Misuse of URL encoding - Path Traversal (2/3)
For example, we have a public website which is accessible viahttp://mysecuresite.com/public If path traversal is possible, the attacker can try to reach other files on the server. If, for example, the website is hosted on a linux server, the attacker can try to take a look at the local user file via http://mysecuresite.com/public/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd which is the same as http://mysecuresite.com/public/../../../../../../etc/passwd
Misuse of URL encoding - Path Traversal (3/3)
How to protect yourself against path traversal?
- sanitize the user input. For example, in order to mitigate the attack mentioned above, we must validate the user input and ensure that it does not contain invalid characters.
- restrict the access to other files on the system
- use safelisting, it consists of creating a list of possible paths that can be accessed safely
Encoding as evasion technique (1/2)
Encoding can be used
- to evade malware detection
- commands are encoded and cannot be read in plain text
- to make evasion stronger, the malware is encoded several times
Encoding as evasion technique (2/2)
conclusion
Encoding is very essential and necessary in the use of IT and to be able to forward, process and store data in a good way. Unfortunately, these techniques are also misused by people with bad intentions and can be part of a cyber attack. In research into cyber attacks, it is therefore very important that you know and recognize these encoding techniques so that you can fully understand what exactly happened during the attack.