--------- commenter.cgi ----------
#!/usr/bin/perl -w
use CGI;
use MIME::Lite;
use Encode qw(encode encode_utf8 );
use utf8;
$query = new CGI;
$comments = $query->param('comments');
# If I collect a UTF-8 charset subject line it becomes
# goobledegook once mailed
$subject_line = $query->param('subject');
# but if I define a UTF-8 character string here it works
# in the subject line of the resulting mail
# $subject_line = "μεÏικÎÏ ÎµÎ»Î»Î·Î½Î¹ÎºÎÏ Î»ÎξειÏ";
MIME::Lite->send ("sendmail", "/usr/bin/sendmail -t -oi");
$msg = MIME::Lite->new (
From => "\"Commenter\" <no-reply\@example.com>",
To => 'comments@example.com',
Type =>'multipart/mixed',
Subject => encode( 'MIME-Header', $subject_line)
);
$body = "Comments: $comments";
$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
$msg->send ();
# either way, utf-8 character input print in the browser
print "Content-type: text/html\n\n";
print $comments;
In comp.lang.perl.misc, Tuxedo <tuxedo@mailinator.net> wrote:
--------- commenter.cgi ----------
#!/usr/bin/perl -w
use CGI;
use MIME::Lite;
use Encode qw(encode encode_utf8 );
use utf8;
Versions? I have Perl v5.24.3 handy.
$query = new CGI;
$comments = $query->param('comments');
# If I collect a UTF-8 charset subject line it becomes
# goobledegook once mailed
$subject_line = $query->param('subject');
# but if I define a UTF-8 character string here it works
# in the subject line of the resulting mail
# $subject_line = "μεÏικÎÏ ÎµÎ»Î»Î·Î½Î¹ÎºÎÏ Î»ÎξειÏ";
That's curious. I'd look at what encoding your query string has.
MIME::Lite->send ("sendmail", "/usr/bin/sendmail -t -oi");
$msg = MIME::Lite->new (
From => "\"Commenter\" <no-reply\@example.com>",
To => 'comments@example.com',
Type =>'multipart/mixed',
Subject => encode( 'MIME-Header', $subject_line)
);
My Encode module does not document a 'MIME-Header' encoding. I use MIME:EncWords for that.
use MIME::EncWords qw( encode_mimeword );
...
Subject => encode_mimeword( $subject_line, 'B', 'UTF-8')
Where "B" is for base64 and 'Q' woult be 'quoted-printable'.
$body = "Comments: $comments";
$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
$msg->send ();
# either way, utf-8 character input print in the browser
print "Content-type: text/html\n\n";
print $comments;
I also use taint checking on CGI. You'll need to clean up the PATH,
etc, for that.
Elijah
------
didn't check versions of the modules
Mime-Version: 1.0[...]
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8Bit
It just reads some greek words. Odd that does not show in your end.
My news reader and news posting window is set to UTF-8.
I try again: μερικές ελληνικές λέξεις. Does anyone see the Greek UTF-8
characters?
Mime-Version: 1.0[...]
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8Bit
User-Agent: KNode/4.14.10
# $subject_line = "μερικές ελληνικές λέξεις";
Mime-Version: 1.0[...]
Content-Type: text/plain; charset="UTF-8"
User-Agent: Vectrex rn 2.1 (beta)
In comp.lang.perl.misc, Tuxedo <tuxedo@mailinator.net> wrote:
# $subject_line = "μεÏικÎÏ‚ ελληνικÎÏ‚ λÎξεις";
That's curious. I'd look at what encoding your query string has.
It just reads some greek words. Odd that does not show in your end. My news reader and news posting window is set to UTF-8.
$msg = MIME::Lite->new (
From => "\"Commenter\" <no-reply\@example.com>",
To => 'comments@example.com',
Type =>'multipart/mixed',
Subject => encode( 'MIME-Header', $subject_line)
);
My Encode module does not document a 'MIME-Header' encoding. I use MIME:EncWords for that.
use MIME::EncWords qw( encode_mimeword );
...
Subject => encode_mimeword( $subject_line, 'B', 'UTF-8')
Where "B" is for base64 and 'Q' woult be 'quoted-printable'.
Thanks for the tips. I will give MIME::EncWords a try.
Another tool I've used for something a similar is Email::MIME::RFC2047::Encoder
I'm not sure where the UTF-8 conversion fails in the mail or CGI.
Hello,
How can I process the input of an HTML form in UTF-8 with CGI and pass it through a MIME-Lite's sending procedure intact?
use Encode qw(encode encode_utf8 );
use utf8;
$query = new CGI;
$comments = $query->param('comments');
# If I collect a UTF-8 charset subject line it becomes
# goobledegook once mailed
$subject_line = $query->param('subject');
# but if I define a UTF-8 character string here it works
# in the subject line of the resulting mail
# $subject_line = "μερικές ελληνικές λέξεις";
MIME::Lite->send ("sendmail", "/usr/bin/sendmail -t -oi");
$msg = MIME::Lite->new (
From => "\"Commenter\" <no-reply\@example.com>",
To => 'comments@example.com',
Type =>'multipart/mixed',
Subject => encode( 'MIME-Header', $subject_line)
$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
In comp.lang.perl.misc, Tuxedo <tuxedo@mailinator.net> wrote:
It just reads some greek words. Odd that does not show in your end. My
news reader and news posting window is set to UTF-8.
I saw the Greek originally, but I had an editor hiccup that clearly
screwed that up. Sorry. That's also why I picked B instead of Q for
the encoding. Q is best suited for mostly ASCII content like French
or German.
$msg = MIME::Lite->new (
From => "\"Commenter\" <no-reply\@example.com>",
To => 'comments@example.com',
Type =>'multipart/mixed',
Subject => encode( 'MIME-Header', $subject_line)
);
My Encode module does not document a 'MIME-Header' encoding. I use
MIME:EncWords for that.
use MIME::EncWords qw( encode_mimeword );
...
Subject => encode_mimeword( $subject_line, 'B', 'UTF-8')
Where "B" is for base64 and 'Q' woult be 'quoted-printable'.
Thanks for the tips. I will give MIME::EncWords a try.
Another tool I've used for something a similar is
Email::MIME::RFC2047::Encoder
I don't know that one, but from the name it's doing the same thing.
RFC-2047 defines "MIME encoded words" for putting non-ASCII content
into 7-bit clean mail headers.
I'm not sure where the UTF-8 conversion fails in the mail or CGI.
Try adding some logging. Sometimes for CGI stuff I find it easiest
to open my own log file and write to that.
I see from other follow-ups this is Perl 5.10.x. I have a 5.10.1 here,
and I tried the code, but I don't have MIME::Lite or MIME::EncWords
for that install.
Elijah
------
only willing to try so hard to duplicate an environment
*SKIP*
$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
That looks like copy-paste, but "carset"?
Anyway, as you see for yourself: if you pass non-latin1 contents
properly stored in Perl's internal encoding (due 'use utf8') to
MIME-Lite (which is Perl's internal encoding aware, apparently) you are
fine. I don't remember 5.10 now (and digging through Changes isn't feasable), *if* you'd be younger (like 5.14) I'd suggest to replace 'use utf8' with 'use feature qw/ unicode_strings /' insted (but it might be
not an option).
Anyway, I suggest, (unless you absolutely need 'use utf8' for something)
drop 'use utf8' and add 'use Encode qw/ decode_utf8 /'. What you need
is *decoding* strings that come out of CGI.pm. Apparently, CGI.pm
doesn't decode whatever comes from network, turns out that's you who has
to do it (decoding). Better yet, 'use Encode qw/ decode /', figure out
what encoding was with the request that CGI.pm dealt with and then
decode properly (there are more encodings outside than just UTF-8).
I would like all contents, including mailheaders (Subject, Reply-to
and From headers to be UTF-8 compatible. So far, I only managed
to print a form's input to the browser but not encode it correctly
through the mail procedure.
On 9/15/21 2:19 PM, Tuxedo wrote:
I would like all contents, including mailheaders (Subject, Reply-to
and From headers to be UTF-8 compatible. So far, I only managed
to print a form's input to the browser but not encode it correctly
through the mail procedure.
I'm late to the party, but I wanted to add the following comment:
Email headers use different (and I believe incompatible) encoding than
the MIME body of the email.
I'd have to go back and (re)read the pertinent RFCs for how to correctly encode non-ASCII characters in email headers. But I'm quite certain
that traditional MIME encoding methods will /not/ work.
I'm late to the party, but I wanted to add the following comment:
Email headers use different (and I believe incompatible) encoding than
the MIME body of the email.
I'd have to go back and (re)read the pertinent RFCs for how to correctly encode non-ASCII characters in email headers. But I'm quite certain
that traditional MIME encoding methods will /not/ work.
Eric Pozharski wrote:
^^^^^^$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
That looks like copy-paste, but "carset"?I'm not sure where I got that from but yes, it's likely copy-paste :-)
with <si2876$1on$1@solani.org> Tuxedo wrote:
Eric Pozharski wrote:
vvvvvv
^^^^^^$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
That looks like copy-paste, but "carset"?I'm not sure where I got that from but yes, it's likely copy-paste :-)
What about "carset" then?
*CUT*
Eric Pozharski wrote:
with <si2876$1on$1@solani.org> Tuxedo wrote:
Eric Pozharski wrote:
vvvvvv
^^^^^^$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
That looks like copy-paste, but "carset"?I'm not sure where I got that from but yes, it's likely copy-paste :-)
What about "carset" then?
I'm not sure what you mean?
*CUT*
Meanwhile, I tested a sending procedure instead of MIME-Lite, namely Mail::Sender but have the same difficultly with UTF-8 for email
transmission for data going through CGI.
I can however transmit a string intact via mail if it's hard-coded in the perl script:
use Mail::Sender;
use utf8;
$subject = "μερικές ελληνικές λέξεις";
my $sender = new Mail::Sender;
from => $from_email,
to => $to_email,
subject => $subject,
charset => 'utf-8',
});
$sender->Close();
But if passed through a CGI form, like this:
use CGI;
use utf8;
use Email::MIME::RFC2047::Encoder;
$subject = $query->param('subject');
my $utf8_subject_encoder = Email::MIME::RFC2047::Encoder->new;
my $utf8_encoded_subject = $utf8_subject_encoder->encode_text($subject);
from => $from_email,
to => $to_email,
subject => $utf8_encoded_subject,
charset => 'utf-8',
});
$sender->Close();
... the subject will show something like follows in a resulting email
subject line:
μεÏικÎÏ ÎµÎ»Î»Î·Î½Î¹ÎºÎÏ Î»ÎξειÏ
The form collecting the "ρικές ελληνικές λέξεις" string uses <meta http-
equiv="Content-Type" content="text/html; charset=utf-8">
And the proper "ρικές ελληνικές λέξεις" will print fine on the output of
the CGI generated HTML result page after being passed through a form.
The output page has <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
It just won't mail for some mysterious reason, maybe relating to CGI.
Use "Email::MIME::RFC2047::Encoder;" is meant to encode for email headers
as far as I understand.
Yet, I can pass "με ρικές ελληνικές λέξεις" into the subject line of an
email without the Encoder procedure, as long as I declare 'use utf8;' at
the top of the script. As said, only if the the string is literally coded into the perl script and not passed as a variable through CGI, it will
also work to email intact.
The correct UTF-8 characters will display fine on a CGI result page
whether hard-coded in the script or passed through a form.
The result was the same with MIME-Lite, so it's not the mailer that's the issue. I'm not sure exactly what is.
Tuxedo
Tuxedo wrote:
Eric Pozharski wrote:
with <si2876$1on$1@solani.org> Tuxedo wrote:
Eric Pozharski wrote:
vvvvvv
^^^^^^$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
That looks like copy-paste, but "carset"?I'm not sure where I got that from but yes, it's likely copy-paste :-)
What about "carset" then?
I'm not sure what you mean?
*CUT*
Meanwhile, I tested a sending procedure instead of MIME-Lite, namely
Mail::Sender but have the same difficultly with UTF-8 for email
transmission for data going through CGI.
I can however transmit a string intact via mail if it's hard-coded in the
perl script:
use Mail::Sender;
use utf8;
$subject = "μερικές ελληνικές λέξεις";
my $sender = new Mail::Sender;
from => $from_email,
to => $to_email,
subject => $subject,
charset => 'utf-8',
});
$sender->Close();
But if passed through a CGI form, like this:
use CGI;
use utf8;
use Email::MIME::RFC2047::Encoder;
$subject = $query->param('subject');
my $utf8_subject_encoder = Email::MIME::RFC2047::Encoder->new;
my $utf8_encoded_subject = $utf8_subject_encoder->encode_text($subject);
from => $from_email,
to => $to_email,
subject => $utf8_encoded_subject,
charset => 'utf-8',
});
$sender->Close();
... the subject will show something like follows in a resulting email
subject line:
μεÏικÎÏ ÎµÎ»Î»Î·Î½Î¹ÎºÎÏ Î»ÎξειÏ
The form collecting the "ρικές ελληνικές λέξεις" string uses <meta http-
equiv="Content-Type" content="text/html; charset=utf-8">
And the proper "ρικές ελληνικές λέξεις" will print fine on the output of
the CGI generated HTML result page after being passed through a form.
The output page has <meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
It just won't mail for some mysterious reason, maybe relating to CGI.
Use "Email::MIME::RFC2047::Encoder;" is meant to encode for email headers
as far as I understand.
Yet, I can pass "με ρικές ελληνικές λέξεις" into the subject line of an
email without the Encoder procedure, as long as I declare 'use utf8;' at
the top of the script. As said, only if the the string is literally coded
into the perl script and not passed as a variable through CGI, it will
also work to email intact.
The correct UTF-8 characters will display fine on a CGI result page
whether hard-coded in the script or passed through a form.
The result was the same with MIME-Lite, so it's not the mailer that's the
issue. I'm not sure exactly what is.
Tuxedo
My issue can be reduced to a difference in the submitted form data
compared with the fixed typed-in string in my perl code, although both flavors of UTF-8 characters appear identical in a browser window through Perl.
One works to email and the other does not. For example, I test with a
simple HTML form submit:
<!DOCTYPE html>
<html><head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form ENCTYPE="multipart/form-data" method="post" action="compare.pl">
<input type="text" name="subject" size="30" value="μερικές ελληνικές
λέξεις">
<input type="submit" value="Submit">
</form>
</body>
</html>
And it's submitted to the following compare.pl script:
#!/usr/bin/perl -w
use CGI;
use utf8;
use Email::MIME::RFC2047::Encoder;
my $fixed_subject;
# Only the following passed directly through an email
# subject intact:
$fixed_subject = "μερικές ελληνικές λέξεις";
my $query = new CGI;
# This value will display correctly in a web browser
# but not after having been sent in a subject
# line of an email via Mime-Lite or other:
my $submitted_subject = $query->param('subject');
# The following $utf8_encoded_submitted_subject will not display correctly
# in a browser or email subject line:
my $utf8_submitted_subject_encoder = Email::MIME::RFC2047::Encoder->new;
my $utf8_encoded_submitted_subject = $utf8_submitted_subject_encoder-
encode_text($submitted_subject);
print "Content-type: text/html\n\n";
print "<!DOCTYPE html>\n";
print "<html><head>\n";
print "<title>Compare</title>\n";
print "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n";
print "</head>\n";
print "<body>\n";
print "\$fixed_subject: $fixed_subject\n";
print "<hr>";
print "\$submitted_subject: $submitted_subject\n";
print "<hr>";
print "\$utf8_encoded_submitted_subject:
$utf8_encoded_submitted_subject\n";
print "</body></html>\n";
I leave out the email code here but as said the $fixed_subject typed
directly into the perl code works in a subject line of a mail transmission through Mime::Lite or Mail::Sender while the $submitted_subject that was corrected as a form value through CGI does not.
What exactly has happens to $submitted_subject in the process and how can
it be made identical to the $fixed_subject string?
Eric Pozharski wrote:
with <si2876$1on$1@solani.org> Tuxedo wrote:
Eric Pozharski wrote:
vvvvvvI'm not sure what you mean?
^^^^^^$msg->attach (Type =>'text/plain; carset=utf-8', Data => $body);
What about "carset" then?That looks like copy-paste, but "carset"?I'm not sure where I got that from but yes, it's likely copy-paste
:-)
It just won't mail for some mysterious reason, maybe relating to CGI.*SKIP*
The result was the same with MIME-Lite, so it's not the mailer that's
the issue. I'm not sure exactly what is.
$subject = $query->param('subject');
That looks like copy-paste, but "carset"?
Anyway, as you see for yourself: if you pass non-latin1 contents
properly stored in Perl's internal encoding (due 'use utf8') to
MIME-Lite (which is Perl's internal encoding aware, apparently) you are
fine. I don't remember 5.10 now (and digging through Changes isn't feasable), *if* you'd be younger (like 5.14) I'd suggest to replace 'use utf8' with 'use feature qw/ unicode_strings /' insted (but it might be
not an option).
Anyway, I suggest, (unless you absolutely need 'use utf8' for something)
drop 'use utf8' and add 'use Encode qw/ decode_utf8 /'.
What you need
is *decoding* strings that come out of CGI.pm. Apparently, CGI.pm
doesn't decode whatever comes from network, turns out that's you who has
to do it (decoding). Better yet, 'use Encode qw/ decode /', figure out
what encoding was with the request that CGI.pm dealt with and then
decode properly (there are more encodings outside than just UTF-8).
What you need
is *decoding* strings that come out of CGI.pm. Apparently, CGI.pm
doesn't decode whatever comes from network, turns out that's you who has
to do it (decoding). Better yet, 'use Encode qw/ decode /', figure out
what encoding was with the request that CGI.pm dealt with and then
decode properly (there are more encodings outside than just UTF-8).
How exacly can 'use Encode qw/ decode /' figure in perl what encoding was used when it's user-submitted via CGI.pm? It can be any set of UTF-8 characters. On the HTML form I define:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Thanks,
Tuxedo
Eric Pozharski wrote:
*SKIP*Anyway, as you see for yourself: if you pass non-latin1 contents
properly stored in Perl's internal encoding (due 'use utf8') to
MIME-Lite (which is Perl's internal encoding aware, apparently) you
are fine. I don't remember 5.10 now (and digging through Changes
isn't feasable), *if* you'd be younger (like 5.14) I'd suggest to
replace 'use utf8' with 'use feature qw/ unicode_strings /' insted
(but it might be not an option).
Anyway, I suggest, (unless you absolutely need 'use utf8' for
something) drop 'use utf8' and add 'use Encode qw/ decode_utf8 /'.
It I drop 'use utf8;' and replace it with:
use Encode qw/ decode_utf8 /;
What can be done to properly decode/encode user-submitted UTF-8 data
in a way that the data can be the same as if typed directly in the
perl code so it can pass through email?
The: use feature qw/ unicode_strings /; .. caused and error on the
perl version I have.
What you need is *decoding* strings that come out of CGI.pm.
Apparently, CGI.pm doesn't decode whatever comes from network, turns
out that's you who has to do it (decoding). Better yet, 'use Encode
qw/ decode /', figure out what encoding was with the request that
CGI.pm dealt with and then decode properly (there are more encodings
outside than just UTF-8).
How exacly can 'use Encode qw/ decode /' figure in perl what encoding
was used when it's user-submitted via CGI.pm? It can be any set of
UTF-8 characters. On the HTML form I define:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 469 |
Nodes: | 16 (2 / 14) |
Uptime: | 58:29:13 |
Calls: | 9,454 |
Calls today: | 1 |
Files: | 13,596 |
Messages: | 6,113,198 |