Exercises in this lecture   Go to the notes, in which this exercise belongs -- Keyboard shortcut: 'u'   Alphabetic index   Course home   

Exercise solution:
Finding the encoding of a given text file


The following program generates a UTF-8 encoded text file:

using System;
using System.IO;
using System.Text;

public class TextWriterProg{

  public static void Main(){
    string str =      "A æ u å æ ø i æ å",
           strEquiv = "A \u00E6 u \u00E5 \u00E6 \u00F8 i \u00E6 \u00E5";
  
    TextWriter tw = new StreamWriter(                         // UTF-8
                       new FileStream("guess-me.txt", FileMode.Create),
                       new UTF8Encoding());
  
    tw.WriteLine(strEquiv);
    tw.Close();
  }

}

The program writes the Danish string to the file guess-me.txt

Here follows the program that reads guess-me.txt with six different encodings:

using System;
using System.IO;
using System.Text;

public class TextReaderProg{

  public static void Main(string[] args){
    Encoding[] encodings = new Encoding[] 
      {Encoding.GetEncoding("iso-8859-1"),
       new UTF8Encoding(),
       new UTF7Encoding(),
       new UnicodeEncoding(),
       new UTF32Encoding(),
       new ASCIIEncoding() };

    foreach(Encoding e in encodings){
      using(TextReader sr = new StreamReader(args[0], e)){
        Console.WriteLine("Encoding {0}:", e);
        Console.WriteLine(sr.ReadToEnd());
        Console.WriteLine();
      }
    }
  }

}

Notice that the program expects you to pass the name of the text file, guess-me.txt, as a program parameter.

The output is something like:

Encoding System.Text.Latin1Encoding:
A æ u å æ ø i æ å


Encoding System.Text.UTF8Encoding:
A æ u å æ ø i æ å


Encoding System.Text.UTF7Encoding:
A æ u å æ ø i æ å


Encoding System.Text.UnicodeEncoding:
?ꛃ??쌠?ꛃ쌠??ꛃ쌠?

Encoding System.Text.UTF32Encoding:
??????

Encoding System.Text.ASCIIEncoding:
A ?? u ?? ?? ?? i ?? ??


We confirm from this that the original file is UTF8 encoded.